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10 METHOD AND APPARATUS FOR DATA ANALYSIS 

FIELD OF THE INVENTION 

The present invention relates generally to a 

method and apparatus for data analysis ♦ More specifically, 
15 the present invention relates to a method and apparatus for 
analyzing data and extracting and utilizing relational 
structures in different domains, such as temporal, spatial, 
color and shape domains. 



2 0 BACKGROUND OF THE INVENTION 

Full motion digital image sequence's in typical 
video applications require the processing of massive amounts 
of data in order to produce good quality visual images from 
the point of view of shape, color and motion. Data compres- 
25 sion is often used to reduce the amount of data which must . 
be stored and manipulated. A data compression system typi- 
cally includes modelling sub- systems which are used to 
provide simple and efficient representations of the large 
amount of video data. 
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A number of compression systems have been devel- 
oped which are well suited for video linage compression. 
These systems can be classified into three main groups 
according to their operational and modelling characteris- ^ 
5 tics. First, there is the causal global modelling approach. 
An example of this type of model is a three dimensional {3D) 
wire frame model which implies spatial controlling position 
and intensity at a small set of more or less fixed wireframe 
grid points and interpolates between the grid points. In 

10 some applications, this approach is combined with 3D ray 
tracing of solid objects. This wire frame approach is 
capable of providing very efficient and compact data repre- 
sentation, since it involves a very deep model, i.e., a 
significant amount of effort must be invested up front to 

15 develop a comprehensive model. Accordingly, this model 
provides good visual appearance. 

However, this approach suffers from several sig- 
nificant disadvantages- First, this causal type model 
requires detailed a priori (advance) modelling information 

20 on 3D characterization, surface texture, lighting character- 
ization and motion behavior. Second, this approach has very 
limited en^irical flexibility in generic encoders, since 
once the model has been defined, it is difficult to supple- 
ment and update it dynaunically as new and unexpected images 

it 

25 are encountered. Thus, this type of model has limited 

usefulness in situations requiring dynamic modelling of real 
time video sequences. 

A second type of modelling system is an empirical, 
updatable compression system which involves very limited 
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model development, but provides relatively inefficient 
compression. The MPEG 1 and MPEG 2 compatible systems 
represent such an approach. For example, in the MPEG stan- 
dard, an image sequence is represented as a sparse set of 
5 still image frames, e.g., every tenth frame in a sequence, 
which are compressed/decompressed in terms of pixel blocks, 
such as 8 X 8 pixel blocks. The intermediate frames are 
reconstructed based on the closest decompressed frame, as 
modified by additional information indicating blockwise 

10 changes representing block movement and intensity change 
patterns. The still image compression/decompression is 
typically carried out using Discrete Cosine Transforms 
(DCT) , but other approaches such as subband, wavelet or 
fractal still image coding may be used. Since this approach 

15 involves very little modelling depth, long range systematic 
redundancies in time and space are often ignored so that 
essentially the same information is stored/transmitted over 
and over again. 

A third type of modelling system is an empirical 

20 global modelling of image intensities based on factor analy- 
sis. This approach utilizes various techniques, such as 
principal component analysis, for approximating the intensi- 
ties of a set of N images by weighted sums of F "factors." 
Each such factor has a spatial parameter for each pixel and 

25 a temporal parameter for each frame. The spatial parameters 
of each factor are sometimes referred to as "loadings", 
while the temporal parameters are referred to as "scores". 
One example of this type of approach is the Karhunen-Loeve 
expansion of an N x M matrix of image intensities (M pixels 
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per frame, N frames) for compression and recognition of 
human facial images. This is discussed in detail in Kirby, 
M. and Sirovich, L. "Application of the Karhunen-Loeve 
Procedure for the Characterization of Human Faces", IEEE 
5 Transactions on Pattern Analysis and Machine Intelligence, 
Vol, 12, No. 1, pp. 103-108 (1990), and R.C.Gonzales and 
R.E.Woods, Digital Image Processing . Chapter 3.6 (Addison- 
Wesley Publ.Co., ISBN 0-201-50803-6, 1992) which are incor- 
porated herein by reference. 

10 In Karhunen-Loeve expansion (also referred to as 

eigen analysis or principal component analysis, Hotelling 
transform and singular value decomposition) , the product of 
the loadings and the scores for each consecutive factor 
minimizes the squared difference between the original and 

15 the reconstructed image intensities. Each of the factor . 
loadings has a value for each pixel, and may therefore be 
referred to as "eigen-pictures" ; the corresponding factor 
score has a value for each frame. It should be noted that 
.the Karhunen-Loeve system utilizes factors in only one do- 

20 main, i.e., the intensity domain, as opposed to the present 
invention which utilizes factors in multiple domains, such 
as intensity, address and probabilistic domains ♦ 

Such a compression system is very efficient in 
certain situations, such as when sets of pixels display 

25 interrelated intensity variations in fixed patterns from 
image to image. For example, if every time that pixels a, 
b, c become darker, pixels d, e, f become lighter, and vice 
versa, then all of pixels a, b, c, d, e, f can be effective- 
ly modelled by a single factor consisting of an eigen pic- 
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ture intensity loading having positive values for pixels a, 
b, c and negative values or pixels d, e, f. The group of 
pixels would then be modelled by a single score number for 
each image. Other interrelated pixel patterns would also 
5 give rise to additional factors. 

This type of approach results in visually disrup- 
tive errors in the reconstructed image if too few factors 
are used to represent the original images. Additionally, if 
the image- to -image variations include large systematic 

10 spatial changes, such as moving objects, then the nximber of 
eigen pictures required for good visual representation will 
be correspondingly high. As a result, the compression rate 
deteriorates significantly. Thus, the Karhunen-Loeve sys- 
tems of factor modelling of image intensities cannot provide 

15 the necessary compression required for video applications. " 

A fourth approach to video coding is the use of 
object oriented codecs. This approach focuses on identify- 
ing "natural" groups of pixels ("objects") that move and/or 
change intensity together in a fairly simple and easily 

20 compressible manner. More advanced versions of object 

oriented systems introduce a certain flexibility with re- 
spect to shape and intensity of individual objects, e.g., 
affine shape transformations such as translations, scaling, 
rotation and shearing, or one factor intensity changes. 

25 However, it should be noted that the object oriented ap- 
proach typically employs only single factors. 

In prior art systems, motion is typically approxi- 
mated by one of two methods. The first of these methods is 
incremental movement compensation over a short period of 
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time which is essentially a difference coding according to 
which the difference between pixels in a frame, n, and a 
previous frame, n-1, are transmitted as a difference image. 
MPEG is one example of this type of system. This approach i 
5 allows for relatively simple introduction of new features 
since they are merely presented as part of the difference 
image. However, this approach has a significant disadvan- 
tage in that dynamic adaptation or learning is very diffi- 
cult. For exartple, when an object is moving in an image, • 
10 there is both a change in location and intensity, making it 
ve3ry difficult to extract any systematic data changes. As a 
result, even the simplest form of motion requires extensive 
modelling. 

Another approach to incremental movement compensa- 
15 tion is texture mapping based on a common reference frame, - 
according to which motion is confuted relative to a common 
reference frame and pixels are moved from the common refer- 
ence frame to synthesize each new frame. This is the ap- 
proach typically employed by most wire frame models. The 
20 advantage of this approach is that very efficient and com- 
pact representation is possible in some cases. However, the 
significant downside to this approach is that the efficiency 
is only maintained as long as the moving objects retain 
their original intensity or texture. Changes in intensity ^ 
25 and features are not easily introduced, since existing 

systems incorporate only one dimensional change models, in 
either intensity or address. 

Accordingly, it is an object of the present inven- 
tion to provide a method and apparatus for data analysis 
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which provides very efficient and compact data representa- 
tion without requiring a significant amount of advanced 
modelling information, but still being able to utilize such 
information if it does exist. 
5 It is also an object of the present invention to 

provide a method and apparatus for data analysis having 
empirical flexibility and capable of dynamic updating based 
on short and long range systematic redundancies in various 
domains in the data being analyzed. 

10 It is a further object of the present invention to 

provide a method and apparatus for data analysis which 
utilizes factor analysis in multiple domains, such as ad- 
dress and probabalistic domains, in addition to the intensi- 
ty domain. Additionally, the factor analysis is performed 

15 for individual subgroups of data, e.g., for each separate- 
spatial object. 

An additional object of the present invention is 
to provide a method and apparatus for data analysis which 
uses multiple factors in several domains to model objects. 

20 These "soft" models (address, intensity, spectral property, 
transparency, texture, type and time) are combined with 
"hard" models in order to allow for more effective learning 
and modelling of systematic change patterns in input data, 
such as a video image. Examples of such "hard" modelling 

25 are: a) conventional affine motions modelling of moving 

objects w.r.t. translation, rotation, scaling and shearing 
(including camera panning and zooming effects) , and, b) 
multiplicative signal correction (MSG) and extensions of 
this, modelling of mixed multiplicative and additive inten- 
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sity effects (H. Martens and T. Naes, Multivariate C alibra- 
tion , pp. 345-350, (John Wiley & Sons, 1989), which is 
incorporated herein by reference. 

A further object of the present invention is the model - 
5 ling of objects in domains other than the spatial domain, 
e*g., grouping of local temporal change patterns into tempo- 
ral objects and grouping of spectral patterns into spectral 
objects. Thus, in order to avoid undesirable oversimplify- 
ing associated with physical objects or object oriented 
10 programming, the term »»holon" is used instead. 



use of change data in the various domains to relate each 
individual frame to one or more common reference frames, and 
not to the preceding frame of data. 
15 SUMMARY OF THE INVENTION 



present invention analyze data by extracting one or more 
systematic data structures found in the variations in the 
input sequence of data being analyzed. These variations are 

20 grouped and parameterized in various domains to form a 
reference data structure with change models in these do- 
mains. This is used in modelling of input data being ana- 
lyzed. This type of parameterization allows both compres- 
sion, interactivity and interpretability . Each data input 

25 is then approximated or reconstructed as a composite of one 
or more parameterized data structures maintained in the 
reference data structure. The flexibility of this approach 
lies in the fact that the systematic data structures and 
their associated change model parameters that make up the 



Yet another object of the present invention is the 



The method and apparatus for data analysis of the 
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reference data structure can be modified by appropriate 
parameter changes in order to insure the flexibility and 
applicability of each individual systematic data structure 
to a larger number of input data. The parameterization 
5 consists of "soft" multivariate factor modelling in various 
domains for various holons, which is optionally combined 
with "hard" causal modelling of the various domains, in 
addition to possible error correction residuals. A pre- 
ferred embodiment of the present invention is explained with 

10 reference to the coding of image sequences such as video, in 
which case the most important domains are the intensity, 
address and probabilistic domains. 

The present invention includes a method and appa- 
ratus for encoding, editing and decoding. The basic model - 

15 ling or encoding method (the "IDLE" modelling method) may.be 
combined with other known modelling methods, and several 
ways of using the basic modelling method may be combined and 
carried out on a given set of data. 

The encoding portion of the present invention in- 

20 eludes methods for balancing the parameter estimation in the 
various domains. Also, the modelling according to the 
present invention may be repeated to produce cascaded model- 
ling and me ta- model ling. 
BRIEF DESCRIPTION OF THE DRAWINGS 

25 The foregoing brief description and further ob- 

jects, features, and advantages of the present invention 
will be understood more completely from the following de- 
scription of presently preferred embodiments with reference 
to the drawings in which: 
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Figure 1 is a flow- chart illustrating the high 
level operation of the encoding and decoding process accord- 
ing to the present invention; 

Figure 2 is a block diagram illustrating singular 
5 value decomposition of a data matrix into the product of a 
score matrix and a loading matrix plus a residual matrix; 

Figure 3a is a pictorial representation of the 
data format for each individual pixel in a reference image; 

Figure 3b is a pictorial representation of how a 

10 reference frame is derived; 

Figures 4a-n are pictorial illustrations of model- 
ling in the intensity (blush) domain, wherein. 

Figures 4a through 4c illustrate various de- 
grees of blushing intensity in input images; 
15 Figures 4d through 4f illustrate the intensi- 

ty change fields relative to a reference 
frame in the encoder; 

Figures 4g and 4h illustrate a blush factor 
loading that sxammarizes the change fields of 
20 several frames in the encoder; 

Figures 4i through 4k illustrate the recon- 
struction of the change fields in the decod- 
er; 

Figures 41 through 4n illustrate the result- 
25 ing reconstruction of the actual image inten- 

sities from the changefields and reference 
image, in the decoder. 
Figures 5a-n are a pictorial illustration of modelling 
in the address (smile)^ domain, wherein, 
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Figures 5a through 5c illustrate various de- 
grees of smiling (movments or address changes 
for pixels) ; 

Figures 5d through 5f illustrate the address 
5 change fields corresponding to various de- 

grees of movements relative to the reference 
image ; 

Figure 5g shows the reference intensity image 
and Figure 5h illustrates a smile factor 
10 loading; 

Figures 5i through 51c illustrate the recon- 
structed address change fields; 
Figures 51 and 5n illustrate the resulting 
reconstructed smiled image intensities. 
15 Figure G is a block diagram representation of an 

encoder according to the present invention; 

Figure 7 is a block diagram representation of a 
model estimator portion of the encoder of Figure 6; 

Figure 8 is a block diagram representation of a 
20 change field estimator of the model estimator of Figure 7; 

Figure 9 is a pictorial representation of the 
operation of a the use of forecasting and local change field 
estimates in the change field estimator of Figure 8; 

Figure 9a is a step-wise illustration of the use 
25 of forecasting and local change field estimates; 

Figure 9b is a summary illustration of the move- 
ments shown in Figure 9a; 

Figure 10 is a detailed block diagram of portions 
of the change field estimator of Figure 8; 
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Figure 11 is a block diagram of the local change 
field estimator portion of the change field estimator shovm 

in Figures 8 and 10; 

Figure 12 is a block diagram of the intepreter 
5 portion of the encoder shown in Figure 7; 

Figure 13 is a block diagram of the decoder, used 
both as part of the encoder in Figure 8, and as stand-alone 
decoder . 

10 DETAir.ED DESCRIPTIOW OF TH E PREFERRED EMBODIMENTS 

The method and apparatus for data analysis of the 
present invention may be used as part of a data compression 
system, including encoding and decoding circuits, for com- 
pressing, editing and decompressing video image sequences by 
15 efficient modelling of data redundancies in various data 
domains of the video image sequences. 

Sg»lf -Modelling of Redundanr-ies in V arious Domains and Sub- 
Operands 

The system of the present invention models redun- 
20 dancies in the input data (or transformed input data) . 

These redundancies may be found in the various domains or 
"operands" (such as coordinate address, intensity, and 
probabalistic) and in various sub -properties of these do- 
mains ( "sub -operands" ) , such as individual coordinate direc- 
25 tions and colors. Intensity covariations over time and 
space between pixels and frames, and over time and space 
between color channels may be modelled. Movement 
covariations are also modelled over time and space between 
pixels, and over time and space between different coordinate 
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channels. These movement covariations typically describe 
the movement of an object as it moves across an image. The 
objects or holons need not be physical objects, rather they 
represent connected structures with simplified multivariate 
5 models of systematic changes in various domains, such as 
spatial distortions, intensity changes, color changes, 
transparency changes, etc. 

Other redundancies which may be modelled include 
probabalistic properties such as opacity, which may be 

10 modelled over time and space in the same manner as color 
intensities. In addition, various low- level statistical 
model parameters from various data domains may be modelled 
over time and space between pixels and between frames. 

In the present invention, successive input frames 

15 are modelled as variations or deviations from a reference . 

freime which is chosen to include a number of characteristics 
or factors in the various domains. For example, factors 
indicative of intensity changes, movements and distortions 
are included in the reference frame, such that input frames 

20 can be modelled as scaled combinations of the factors in- 
cluded in the reference frame. The terms factors and load- 
ings will be used interchangeably to refer to the systematic 
data structures which are included in the reference frame. 
Abstract Redundancy Modelling 

25 The system and method of the present invention 

combine various model structures and estimation principles, 
and utilize data in several different domains, producing a 
model with a high level of richness and capable of recon- 
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structing several different image elements. The model may 
be expressed at various levels of depth. 

The modelling features of the present invention 
are further enhanced by using externally established model 
5 parameters from previous images. This procedure utilizes 
pre-established spatial and/or temporal change patterns, 
which are adjusted to model a new scene. Further enhance- 
ment may be obtained by modelling redundancies in the model 
parameters themselves, i.e., by performing principal compo- 
10 nent analysis on the sets of model parameters. This is 
referred to as me ta- modelling. 

The present invention may employ internal data 
representations that are different from the input and/or 
output data format. For example, although the input/output 
15 format of video data may be RGB, a different color space may 
be used in the internal parameter estimation, storage, 
transmission or editing. Similarly, the coordinate address 
system may be cartesian coordinates at a certain resolution 
(e.g., PAL format), while the internal coordinate system may 
20 be different, e.g., NTSC format or some other regular or 

irregular, dense or sparse coordinate system, or vice versa. 
Encoder 

An encoder embodying the present invention pro- 
vides models to represent systematic structures in the input 
25 data stream. The novel model parameter estimation is multi- 
variate and allows automatic self -modelling without the need 
for any prior model information. However, the system can 
still make effective use of any previously established model 
information if it is available. The system also provides 
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dynamic mechanisms for updating or eliminating model compo- 
nents that are found to be irrelevant or unreliable. The 
system is also flexible in that different level models may 
be used at different times. For example, at times it may be 
5 advantageous to use shallow intensity based compression, 
while at other times it may be desirable to use deep hard 
models which involve extensive prior analysis . 

Additionally, the present system includes automat- 
ic initialization and dynamic modification of the compres- 

10 sion model. In addition, the present invention may be used 
for any combination of compression, storage, transmission, 
editing, and control, such as are used in video telephone, 
video compression, movie editing, interactive games, and 
medical image databases, 

15 In addition, the present invention can use factor 

modelling to simplify and enhance the model parameter esti- 
mation in the encoder, by using preliminary factor models 
for conveying structural information between various local 
parts of the input data, such as between individual frames 

20 in a video sequence. This structural information is used 
statistically in the parameter estimation for restricting 
the nximber of possible parameter values used to model each 
local part, e.g., frame. This may be used in the case of 
movement estimation, where the estimation of the movement 

25 field for one frame is stabilized with the help of a low- 
dimensional factor movement model derived from other frames 
in the same sequence. 

An encoder according to the present invention com- 
presses large amounts of input data, such as a video data 
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Stream, by compressing the data in separate stages according 
to various models. In general, video sequences or frames 
can be represented by the frame- to- frame or interframe 
variations, including the variation from a blank image to 
5 the first frame as well as subsequent interframe variations. 
In the present encoder, interframe variations are detected, 
analyzed and modelled in terms of spatial, temporal and 
probabalistic model parameters in order to reduce the amount 
of data required to represent the original frames. The 

10 obtained model parameters may then be further compressed to 
reduce the data stream necessary for representing the origi- 
nal images. This further compression may be carried out by 
run length coding, Huffman coding or any other statistical 
compression technique. 

15 The compressed data may then be edited (e.g., as. 

part of a user- controlled video game or movie editing sys- 
tem) , stored (e.g., in a CD-ROM, or other storage medium) or 
transmitted (e.g., via satelite, cable or telephone line), 
and then decompressed for use by a decoder. 

20 

Decoder 

The present invention also provides for a decoder, 
at a receiving or decompression location which essentially 
performs the inverse function of the encoder. The decoder 
25 receives the compressed model parameters generated by the 
encoder and decompresses them to obtain the model parame- 
ters. The model parameters are then used to reconstruct the 
data stream originally input to the encoder. 
Parameter Estimation in the Encoder 
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Extending. Widening and Deepening of a Reference 

Model 

In the encoder of the present invention, one or 
more extended reference images are developed as a basis for 
5 other model parameters to represent the input data stream of 
image sequences or frames. Thus, all images are represented 
as variations or changes relative to the extended reference 
images. The reference images are chosen so as to be repre- 
sentative of a number of spatial elements found in a se- 

10 quence of images. The reference image is "extended" in the 
sense that the size of the reference image may be extended 
spatially relative to an image or frame in order to accommo- 
date and include additional elements used in modelling the 
image sequences. Conceptually, the reference frame in the 

15 preferred embodiment is akin to a collage or library of 
picture elements or components. 

Thus, a long sequence of images can be represented 
by a simple model consisting of an extended reference image 
plus a few parameters for modelling systematic image changes 

20 in address, intensity, distortion, transparency or other 

variable. When combined with individual temporal paraimeters 
for each frame, these spatial parameters define how the 
reference image inensities in the decoder are to be trans- 
formed into a reconstruction of that frame's intensities. 

25 Reconstruction generally involves two stages. First, it 

must first be determined how the reference frame intensities 
are to be changed spatially in terms of intensity, transpar- 
ency, etc. from the reference coordinate system and repre- 
sentation to the output frame coordinate system and repre- 
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sentation. Second, the reference frame intensities must be 
changed to the output frame intensities using image warping. 
S ystem Operation 

Figure 1 is a block diagram illustration of the 
5 high level operation of the present invention, showing both 
the encoding and decoding operations. In the encoder, video 
input data 102 is first input to the system at step 104 and 
changes are detected and modelled at steps 106 and 108 
respectively, in order to arrive at appropriate model param- 
10 eters 110. 

The model parameters 110 are then compressed at 
step 111 in order to further reduce the amount of informa- 
tion required to represent the original input data. This 
further compression takes advantage of any systematic data 

15 redundancies present in the model parameters 110. These 
temporal parameters also exhibit other types of redundan- 
cies. For example, the scores or scalings which are applied 
to the loadings or systematic data structure in the refer- 
ence frame, may have temporal autocorrelation, and can 

20 therefore be compressed by, for example, predictive coding 
along the temporal dimension. Additionally, there are 
correlations between scores which can be exploited by 
bilinear modelling, followed by independent compression and 
transmission of the model parameters and residuals. Like- 

25 wise, other redundancies such as between color 

intercorrelations or between parameter redundancies that may 
be modelled. 

These model parameters 110 are then used by a 
decoder according to the present invention where the model 
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parameters are first decompressed at step 120, and at step 
122, used to reconstruct the original input image, thereby- 
producing the image output or video output 124 . 

The decompression procedure at step 120 is essen- 
5 tially the inverse process that was performed in the com- 
pression step 111. It should be noted that the encoder and 
decoder according to the present invention may be part of a 
real-time or pseudo real-time video transmission system, 
such as picture telephone. Alternatively, the encoder and 

10 decoder may be part of a storage type system, in which the 
encoder cort^jresses video images or other data for storage, 
and retrieval and decompression by an encoder occur later. 
For example, a video sequences may be stored on floppy 
disks, tape or another portable medium. Furthermore, the 

15 system may be used in games, interactive video and virtual 
reality applications, in which case the temporal scores in 
the decoder are modified interactively. The system may also 
be used for database operations, such as medical imaging, 
where the parameters provide both compression and effective 

20 search or research applications. 

Soft Modellincr by Factor Analysis of Different Domains and 
Sub -Operands 

The present invention utilizes factor analysis, 
which may be detexmined by principal component analysis or 
25 singular value decomposition, to determine the various 

factors which will be included in the reference frame. A 
video sequence which is input to the present invention may 
be represented as a series of frames, each frame represent- 
ing the video sequence at a specific moment in time. Each 
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frame, in turn, is composed of a rnomber of pixels, each 
pixel containing the data representing the video information 
at a specific location in the frame • 



5 frames are decomposed into a set of scores or weightings in 
various domains and sub- operands which are to be applied to 
one or more factors contained in a reference frame. As 
shown in Figure 2, N input frames, each composed of M vari- 
ables, e.g., pixels, may be arranged in an N by M matrix 

10 202. In this representation, the pixels are arranged as one 
line for each frame, instead of the conventional two-dimen- 
sional row/ column arrangement. The matrix 202 may then be 
decort^osed or represented by temporal score factors f=l, 2, 
. . , F for each frame, forming an N by F matrix 204, multi- 

15 plied by a spatial reference model, consisting of spatial 

loadings for the F factors, each with values for each of the 
M pixels, thus forming a loading matrix 206 of size F by M. 
If the number of factors F is less than the smaller of N or 
M, a matrix of residuals (2 08) may be used to summarize the 

20 unmodelled portion of the data. This is described in fur- 
ther detail in H. Martens and T. Naes, Multivar iate Calibra- 
tion . Chapter 3 (John Wiley & Sons, 1989), which is incorpo- 
rated herein by reference. This type of as sumption -weak 
self -modelling or "soft modelling" may be optionally com- . 

25 bined with more assumption- intensive "hard modelling" in 

other domains, such as movements of three-dimensional solid 
bodies and mixed multiplicative/addive modelling of intensi- 
ties by MSG modelling and extensions of this (H. Martens and 



In accordance with the present invention, input 
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T. Naes, Multivariate Calibration , pp 345-350, (John Wiley & 
Sons, 19 89), which is incorporated herein by reference. 

Figure 3b illustrates how several objects from 
different frames of a video sequence may be extracted as 
5 factors and combined to form a reference frame. As shown in 
Figure 3, frame 1 includes objects 11 and 12, a taxi and 
building, respectively. Frame 4 includes the building 12 
only, while frame 7 includes building 12 and car 13. An 
analysis of these frames in accordance with the present 

10 invention results in reference frame 2 0 which includes 

objects 11, 12, and 13. It should be noted that the holons 
heed not be solid objects such as a house or a car. Rather, 
the same principles may be used to spatially represent more 
plastic or deformsible objects such as a talking head; howev- 

15 er, change factors in other domains may be required. 

Figure 3a is a pictorial representation of the 
data format for each individual pixel in a reference image. 
Coordinate systems other than conventional pixels may also 
be used in the model representation. These include pyrami- 

20 dal representations, polar coordinates or any irregular, 
sparse coordinate system. 

As shown in Figure 3a, each pixel contains inten- 
sity information, which may be in the form of color informa- 
tion given in some color space, e.g., RGB; address informa- 

25 tion which may be in the form of vertical (V) , horizontal 
(H) , and depth (Z) information; in addition to 
probabilistic, segment, and other information, the number of 
such probabilistic values being different during the encoder 
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parameter estimation as compared with after the parameter 
estimation. 

Each of these information components may in turn 
at various stages be composed of one or more information 
5 sub-cortqponents which may in turn be composed of one or more 
further information sub- components . For example, as shown 
in Figure 3a, the red (R) color intensity information con- 
tains several red information components R(0), R(l), R(2), 
Similarly, R(2) contains one or more information sub- 

10 components indicating parameter value, uncertainty, and 
other statistical information. 

The choice of objects which are used to construct 
the reference image depends on the type of application. For 
example, in the case of off-line encoding of previously re- 

15 corded video images, objects will be chosen to make the 
reference image as representative as possible for long 
sequences of frames. In contrast, for on-line or real time 
encoding applications, such as picture telephone or video 
conferencing, objects will be selected such that the refer- 

20 ence image will closely correspond to the early images in 
the sequence of frames. Subseciuently, this initial refer- 
ence frame will be improved or modified with new objects as 
new frame sequences are encountered and/or obsolete ones 
eliininated. 

25 General temporal information ("scores") are repre- 

sented by the letter u followed by a second letter indicat- 
ing the type of score, e.g., uA for address scores. Occa- 
sionally, a subscript is added to indicate a specific point 
in time, e.g., uA^, to indicate frame n. 
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Spatial information is represented in a hierarchi- 
cal format. The letter X is used to represent spatial 
information in general, and includes one or more of the 
following domains: I (intensity), A (address) and P 
5 (prababilistic properties) . These domains represent data 
flow between operators and are thus referred to as operands. 
Each of these domain operands may in turn contain one or 
more " sub- operands . " For example, intensity I may contain 
R, G and B sub- operands to indicate the specific color 

10 representation being used. Similarly, address A may contain 
V (vertical) , H (horizontal) and Z (depth) sub-operands to 
indicate the specific coordinate system being used. Also, 
probabilistic properties P may include sub-operands S (seg- 
ment) and T (transparency) . Spatial information may be 

15 represented in different formats for different pixels . In 
addition, the various domains and sub-operands may be refor- 
mulated or redefined at various stages of the data input, 
encoding, storage, transmission, decoding and output stages. 

Each spatial point or pixel may thus be represent - 

20 ed by a number of different values from different domains 
and sub-operands. For each sub-operand, there may be more 
than one parameter or "change factor." The factors are 
counted up from zero, with the zeroth factor representing 
the normal image information (default intensity and 

25 address). Thus, within X(0), 1(0) represents normal picture 
intensity information, A(0) represents implicit coordinate 
address information and P(0) represents probabilistic infor- 
mation such as transparancy, while X(f ) , f>0 represents 
various other change model parameters or factor loadings. 
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i.e., systematic patterns in which the pixels vary together 
in the different domains. 

Spatial information is defined for objects accord- 
ing to some spatial position, which is given in upper case 
5 letters, lower case letters and subscripts. Upper case 

letters refer to spatial information in the reference image 
position, lower case letters refer to spatial information in 
the position of a specific image, with the specific image 
being indicated by a subscript. Thus, Xr^ refers to the 

10 spatial model in the reference position for a given 

sequence, while x„ refers to spatial data for input frame n. 

Change fields, which are unparameterized differ- 
ence images, are used to indicate how to change one image 
into another according to the various domains. Change 

15 fields are indicated using a two letter symbol, typically 
used in conjunction with a two letter subscript. The first 
letter of the two letter symbol is D or d which indicates 
difference or delta, while the second letter indicates the 
domain or sub- operand. The subscripts are used to designate 

20 the starting and ending positions. For example, DAj^^f^ de- 
fines how to move the pixel values given in the reference 
position into those of reconstructed frame # m, while da^m 
defines how to move pixel values from frame # m to frame # 
n. 

25 Widening a Reference Model to Allow a Wider Range 

of Systematic Expression 

A reference image may be "widened" to include more 
types of change information than those available in the 
individual input images. For example, the picture intensity 
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of a color image in an RGB system is typically represented 
by a single R, G and B intensity value for each of the red, 
green and blue color components associated with each indi- 
vidual pixel. However, in the case of a widened reference 
5 image, there may be several systematic ways in which groups 
of pixels change together. These change factor loadings may 
be defined for individual colors or combinations of colors, 
and for individual holons or groups of holons. 

The "widening" of the reference image for a given 

10 video sequence may also be perfoimied for data domains other 
than color intensities, such as address (coordinates) and 
various probabilistic properties such as transparency. 
Widening of the reference image is used to refer to the 
parameterization of the model used for a particular scene. 

15 By combining different model parameters in different ways in 
a decoder, different individual manifestations of the model . 
may be created. These output manifestations may be statis- 
tical approximations of the individual input data (individu- 
al video frames) , or they may represent entirely new, syn- 

20 thesized outputs, such as in virtual reality applications. 

The widening parameterization of the reference 
frame in various domains may be obtained using a combination 
of "soft" factor analytic modelling, traditional statistical 
parameters, ad hoc residual modelling and "hard" or more 

25 causally oriented modelling. 

Once an extended or widened reference image model 
is established, it may be dynamically modified or updated to 
produce a "deepened" reference image model. This "deepened" 
reference model includes "harder" model parameters that have 
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a high probability of representing important and relevant 
image information, and a. low probability of representing 
unimportant and irrelevant change information. 

The purpose of widening in the various domains is 
5 to combine in a compact and flexible representation, change 
image information from various frames in a sequence. In the 
case of automatic encoding, this may be accomplished by 
combining new change information for a given frame with the 
change image information from previous frames in order to 
10 extract systematic and statistically stable common struc- 
tures. .This is preferably accomplished by analyzing the 
residual components of several frames and extracting model 
parameter loadings. The computations may be carried out 
directly on the residuals or on various residual cross 
15 products. Different weighting functions can be used to 

ensure that precise change information is given more empha- 
sis than imprecise change information, as described in H. 
Martens and T. Naes, Multivar iate Calibration, pp 314- 
321, (John Wiley & Sons, 1989), which is incorporated herein 
20 by reference. The extraction of new bilinear factors and 
other parameters may be performed on different forms of the 
data, all providing essentially the same result. The data 
format may be raw image data, residual image information 
after removal of previously extracted model parameters or 
25 model parameters already extracted by some other method or 
at a different stage in the encoding process. 

Several types of modellable structures may be ex- 
tracted during the widening process. One general type is 
based on spatio-temporal covariations, i.e., one or more 
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informational domains vary systematically over several 
pixels over several frames. A typical form of covariation 
is multivariate linear covariance, which can be approximated 
by bilinear factor modelling. This type of factor extrac- 
5 tion is applicable to each of the different domains, e.g., 
address, intensity and probabilistic. Nonlinear or non- 
metric summaries of covariations may also form the basis for 
the widening operations. 

Bilinear factors may, for example, be extracted 

10 using singular value decomposition, which is applied to the 
residual components from a number of frames. Singular value 
decomposition maximizes the weighted sum- of -squares uised for 
extracting factors, but does not provide any balancing or 
filtering of noise, or optimizing of future compression. 

15 More advanced estimation techniques, such as the non-linear 
iterative least squares power method (NIPALS) , may be used. 
The NIPALS method is an open architecture allowing the use 
of additional criteria, as needed. 

The NIPALS method is applied to a matrix of res id- 

20 ual values E^.^ (matrix E in a system with a-1 factors), from 
several frames in order to extract an additional factor and 
thereby reduce the size of the residual matrix to (residu- 
al matrix in a system having a factors) . The residual 
matrix can in turn be used to find the (a+l)th factor 

25 resulting in residual matrix E^^^. 

This type of factor analysis may be applied to the 
different sub- operands in the various domains, and not just 
to the image intensities. Typically, address information 
for a picture frame is typically given in terms of cartesian 
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coordinates which specify horizontal and vertical addresses 
for each pixel location- However, in a widened reference 
frame, the address information may include multiple vari- 
ables for each single input pixel's coordinates. 
5 The additional change factors in a widened refer- 

ence image, widen the range of applicability of the result- 
ing image model in the sense that many additional different 
visual qualities or patterns may be represented by different 
combinations of the additional change factors or "loadings," 

10 In a preferred embodiment according to the present inven- 
tion, the different loadings are combined linearly, i.e., 
each loading is weighted by a "score" and the weighted 
loadings are summed to produce an overall loading. The 
score values used in the weighting process may be either 

15 positive or negative and represent a scale factor applied to 
the loadings or change factors. This will now be illustrat- 
ed for sub-operands red intensity rn,n=l,2, . . . ,N and vertical 
address Vn,n=l,2, . . ,N. When modelling intensity changes, the 
scores may be used to "turn up" or "turn down" the intensity 

20 pattern of the loading. Similarly, when modelling address 
distortion (movements) , the scores are used to represent how 
much or how little the loading is to be distorted. 

Utilizing the above-mentioned widening principle 
for widening a reference frame, an individual input frame's 

25 redness intensity R„, for example, may be modelled as a 
linear combination or sxammation of redness change factor 
loadings (note that the "hat" symbol here is used in its 
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conventional statistical meaning of "reconstructed" or "estimated") 

r„hat = RR,f(0)*uR(0), + Rj,,f (1) *uR(l) „ + R^^f (2) *uR(2)„ 
+ . • • (1) 

which may also be summarized over factors f=0,l,2,,.. using 
5 matrix notation as : 

r„hat = RRef*XJR„ 

where RRcf={ RRcf(O) , RRcf(l) / RRcf{2) , . . . } represents the spatial 
change factor loadings for redness in the extended reference 
model (for this holon) , and [U^ = {Uo^£„, Uj^^ •..} ] XJR„ = { 

10 uR(0)„, uR(l)„, uR(2)n, ...} represents the temporal redness 

scores which are applied to the reference model, (designated 
as i) to produce an estimate of frame n's redness. 
Intensity change factors of this type a.re herein called 
"blush factors" because they may be used to model how a face 

15 blushes. However, it will be appreciated that these factors 
may be used to model many other types of signals and phenom- 
enon, including those not associated with video. 

The use of these so-called blush factors is illustrated 
in Figures 4a through 4n. Figures 4a, 4b and 4c show the 

20 intensity images rn,n=l,2,3 of a red color channel for a 

person blushing moderately (4a) , blushing intensely (4b) and 
blushing lightly (4c) , respectively. The first frame rj is 
here defined as the reference frame. Accordingly, R(0)Rcf = 
ii- 

25 Figures 4d through 4f show the corresponding 

intensity change fields DRRcf^,n=l,2,3. In this non-moving 
illustration, the change field for a frame equals the dif- 
ference between the frame and the reference image, or dr^^T^- 
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RRef(O) • '^^^ change field is aLSO shown as a curve for a 
single line taken through the blushing cheeks of Figures 4a 
through 4c. As shown in Figures 4d through 4f , the lightly 
- blushing (pale) face of figure 4c has the lowest intensity 
5 change field values (Figure 4f ) , the moderately blushing 

face of Figure 4a has no intensity change, since it actually 
is the reference image, (Figure 4d) , while the intensely 
blushing face of Figure 4b has the highest intensity change 
field values (Figure 4e) . 
10 The statistical processing of the present inven- 

tion will extract a set of generalized blush characteristics 
or change factor loadings, to be used in different frames to 
model blushing states of varying intensity. Figures 4a 
through 4f indicate a single blush phenomenon with respect 
15 to the reference image. The principal component analysis. of 
the change fields DRReM,n=l,2, 3 may give a good description 
of this using one single blush factor, whose loading R(l)Ref 
is shown in figure 4h with the respective scores (0, 1.0 and 
-0.5) given below. The modelling of the red intensity during 
20 decoding in this case is achieved by applying these differ- 
ent scores to the main blush factor loading R(l)Rcf to produce 
different change fields DRRef^ (Figures 4i through 4k) and 
adding that to the reference image redness (Figure 4g) to 
produce the reconstructed redness images (Figures 41 through 
25 4n) : 

r^hat = RRef(O) + DRr^^^^ 
where the redness change field is: 

DRRcf^= RRef(l)*UR(l)^ 
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As indicated by the numbers below figures 4d-f , 
the score value uR(l)„ in this case is 0 for the reference 
image (4a) itself, since here rihat=RRcf (0) , is positive, 
e.g., 1.0, for the second frame {4b) with more intense 
5 blushing, and is negative, e.g., -0.5, for the pale face in 
the third frame (4c) . It should be noted that the negative 
score for the third frame, Figure 4c, transforms the posi- 
tive blush loadings Figure 4h into a negative change field 
DRRef^ for the the third image which is paler than the refer- 
10 ence frame. 

If more than one phenomenon contributed to the 
redness change in the images of this sequence, then the 
model would require more than one change factor. For exam- 
ple, if the general illumination in the room was varied, 
15 independent of the person blushing and paling, this situa- 
tion may be modelled using a two factor solution, where the 
second factor involves applying a score uR(0)n to the refer- 
ence frame itself: 

r„hat = RRrf(O) + DRRef^ 
20 where the blush change field is: 

DRR,f,a = RRef(0)*uR(0)„ + RR,f(l)*uR(l)^ 
which may be generalized for different colors and different 
factors as: 

25 DlR,f^ = lRcf*uI^ (2) 

Thus, Figures 4a- 4n show how the effect of blush factor 
loading 4h (contained in I„f) can be increased or decreased 
(appropriately scaled by scores ul^) to produce various blush 
change fields such as are shown in Figures 4d through 4f . 
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In this manner, significant amounts of intensity information 
may be compressed and represented by a single loading (Fig- 
ure 4h) and a series of less data intensive scores. 

Changes in transparency T and changes in 
5 probabilistic properties P may be modelled in a similar 
manner. In the case of probabilistic modelling, bilinear 
modelling is used in the preferred embodiment of the present 
invention* The spatial loadings P{f),f=0,l,2 and corre- 
sponding scores uP (f ) f =1, 2 , . . • together constitute the 
10 probabilistic change factors. 

Similar to the blush factors used to represent 
intensity information, address information may also be 
modelled by a linear combination of change factor loadings. 
For example, a frame's vertical address information V„ may be 
15 modelled in terms of a linear combination or summation of 
change factor loadings: 

DV„ = VR,f(0)*uV(0)„ + Vr^{1)*uV(1)„ + VRrf(2)*uV(2)„ + 

... (1) 

which may also be summarized over vertical movement factors 
20 f=0,l,2,... in matrix notation as: 

where VR,f={ VR,f(0) , V^^fd) , VR,f{2),...} is the vertical spa- 
tial address change factor loadings for redness in the 
extended reference model (for this holon) , and UV^ = { 
25 uV(0)^, uV(l)„, uV(2)„, ...} represents the temporal vertical 
movement scores which are applied to reference model in 
order to produce an estimate fo frame n's vertical coordi- 
nates for the various pixels in the frame. Address change 



wo 95/08240 




PCT/US94/10190 



33 

factors of this type are referred to as "smile" factors, 
because they may be used to model how a face smiles. 

Similar to the blush factors, here the vertical 
address change field needed to move the contents of the 
5 reference frame to approximate an input frame is referred to 
as DVr^^. It may be modelled as a sum of change contributions 
from address change factor loadings (V^f) scaled by appropri- 
ate scores (u^) * The address change factors are used to model 
motion and distortion of objects. The address change fac- 

10 tors used to model distortion of objects are referred to as 
"smile factors" because they may be used to model general- 
ized, "soft" movements, e,g, how a face smiles. However, it 
will be appreciated that smile factors can equally well 
model any signal or phenomenon, including those not associ- 

15 ated with video, which may be modelled as a complex of 
samples which may be distorted while still retaining a 
common fundamental property. 

The use of smile factors in accordance with the 
present invention is illustrated in Figures 5a through 5n, 

20 Figures 5a through 5c show a face exhibiting varying degrees 
of smiling. Figure 5a shows a moderate smile; Figure 5b 
shows an intense smile; and Figure 5c shows a negative smile 
or frown. The moderately smiling face of Figure 5a may be 
used as part of the reference frame Figure 5g for illustra- 

25 tion. The address change fields DV^^^^ corresponding to 

vertical movements of the mouth with respect to the refer- 
ence image, as shown in Figures 5a through 5c, are shown in 
Figures 5d through 5f . The concept of "reference position" 
(corresponding to the reference image Figure 5g) is here 
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illustrated for Figures 5d, e and f, in that numerical 
values of each pel in an address change field DV^^fn are given 
at pixel coordinates in the reference image of Figure 5g, 
not at the coordinates in frames n=l,2,3 (Figures 5a 
5 through 5c) . Thus, the vertical change fields (movements) 
necessary to transform the reference image (Figure 5g) into 
each of the other frames Figures 5a through 5c are shown as 
vertical arrows at three points along the mouth at the posi- 
tion where the mouth is found in the reference image (Figure 

10 5g) . The base of the arrows is the location of the mouth in 
the reference image (Figure 5g) , while the tips of the 
arrows are located at the corresponding points on the mouth 
in the other frames of Figures 5a through 5c. The full 
change fields are also given quantitatively alongside Fig- 

15 ures 5d through 5f as continous curves for the single line 
through the mouth in the reference image (Figure 5g) . 



tration functions both as the reference image (Figure 5g) 
and as an individual frame, the vertical smile change field 

20 DVRcf^i for frame 1 (Figure 5d) contains all zeros. In Figure 
5b, the middle of the mouth moves downward and the ends of 
the mouth move upward. Thus, the smile field DVR^f^ is nega- 
tive in the middle and positive at either side of the mouth 
in its reference position. The frown of Figure 5c illus- 

25 trates the opposite type pattern. These change fields thus 
contain only one type of main movement and and may thus be 
modelled using only one smile factor, and this may be ex- 
tracted by principal component analysis of the change fields 
in Figures 5d through 5f . The smile factor scores uV^ are in 



Since the first frame of Figure 5a in this illus- 
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this illustration, zero for the reference image itself 
(Figure 5a), positive f or - frame 2 (Ficpare 5b) and negative 
for frame 3 (Figure 5c) , when the common vertical smile 
loading is as shown in Figure 5h. 
5 If the head shown in Figures 5a through 5c were 

also moving, i.e., nodding, independently of the smile 
action, then a more involved movement model would be needed 
to accurately model all the various movements. In the 
simplest case, one or more additional smile factors could be 

10 used to model the head movements, in much the same manner as 
multi- factor blush modelling. Each smile factor would then 
have spatial loadings, with a variety of different movements 
being siirqaly modelled by various combinations of the few 
factor scores. Spatial rotation of image objects in two or 

15 three dimensions would require factor loadings in more 
coordinate dimensions, or alternatively require various 
coordinate dimensions to share some factor loadings. For 
example, if the person in Figures 5a- 5n tilted their head 45 
degrees sideways, the smile movements modelled in Figures 

20 5a- 5n as purely vertical movements would no longer be purely 
vertical. Rather, an equally strong horizontal component of 
movement would also be required. The varying smile of the 
mouth would still be a one- factor movement, but now with 
both a vertical and a horizontal component- Both a vertical 

25 and a horizontal loading may be used, in this case with 

equal scores. Alternatively, both the vertical and horizon- 
tal movement may share the same loading (Figure 5h) , but 
again with different scores depending on the angle of the 
tilting head. 
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For better control and simpler decoding and com- 
pression, some movements may instead be modelled by a hard 
movement model, referred to as "nod" factors. The nod 
factors do not utilize explicit loadings, but rather refer 
5 to affine transformations of solid bodies, including camera 
zoom and movements. Smile and nod movements may then be 
combined in a variety of ways. In a preferred embodiment 
according to the present invention, a cascade of movements 
is created according to some connectivity criteria. For 

10 example, minor movements and movement of pliable, non- solid 
bodies, such as a smiling mouth, may be modelled using smile 
factors (soft modelling) , while major movements and movement 
of solid bodies, such as a head, may be modelled using nod 
factors (hard modelling) . In the case of a talking head, 

15 the soft models are first applied to modify the initial 

vertical reference addresses V^^f to the "smiled" coordinates 
in the reference position, V„^5,j,ja«i®Ref- The same procedure is 
carried out for the horizontal, and optionally to the depth, 
coordinates for forming A^^niaed®Ref • These smiled coordinates 

20 An,rouicd@Ref then modified by affine transformations, i.e., 

rotation, scaling, shearing, etc., to produce the smiled and 
nodded coordinate values, still given in the reference 
position, i^Rcf. The final address change field DA^ef^ is then 
i3alculated as DA^ef^^ A^oRcf-ARef. 



ENCODING 

Generally, the encoding process includes estab- 
lishing the spatial model parameters X^f for one or more 
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reference images or models and then estimating the temporal 
scores U„ and residuals E„ for each frame. The encoding 
process may be fully manual, fully automatic or a mix of 
manual and automatic encoding. The encoding process is 
5 carried out for intensity changes, movement changes, distor- 
tions and probabalistic statistical changes. 

Manual Encoding 

In one embodiment according to the present inven- 
10 tion, video sequences may be modelled manually. In the case 
of manual modelling, an operator controls the modelling and 
interprets the sequence of the input video data. Manual 
modelling may be performed using any of a number of avail- 
able drawing tools, such as "Corel Draw" or "Aldus 
15 Photoshop", or other specialized software. 

Since humans are fairly good at intuitively dis- 
criminating between smile, blush and segmenting, the encod- 
ing process becomes mainly a matter of conveying this infor- 
mation to a computer for subsequent use, rather than having 
20 a computerized process develop these complicated relation- 
ships . 

If there are reasons for using separate models, 
such as if the sequence switches between different clips, 
the clip boundaries or cuts may be determined by inspection 
25 of the sequence. Related clips are grouped together into a 
scene. The different scenes can then be modelled separate- 
ly. 

For a given scene, if there are regions which 
exhibit correlated changes in position or intensity, these 
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regions are isolated as holons by the human operator. These 
regions may correspond to objects in the sequence. In 
addition, other phenomena such as shadows or reflections may 
be chosen as holons. In the case of a complex object, it 
5 may be advantageous to divide the object into several 

holons. For instance, instead of modelling an entire walk- 
ing person as one holon, it may be easier to model each 
portion, e.g., limb, separately. 

For each holon,. the frame where the holon is best 
10 represented spatially is found by inspection. This is 

referred to as the reference frame. A good representation 
means that the holon is not occluded by or affected by 
shadows from other holons, is not significantly affected by 
motion blur, and is as representative for as much of the 
15 sequence as possible. If a good representation cannot be 
found in any specific frame in the sequence, the holon 
representation may be synthesized by assembling good repre- 
sentation portions from several different original frames, 
or by retouching. In this case of a synthesized holon, the 
20 reference frame is made up of only the synthesized holon. 
Synthesized holons are quite adequate for partially trans- 
parent holons such as shadows, where a smooth dark image is 
often sufficient. This chosen or synthetic holon will be 
included as part of the reference image. The intensity 
25 images of the holons from the respective frames are extract- 
ed and assembled into one common reference image. 

Each holon must be assigned an arbitrary, but 
unique, holon number. A segmentation image the same size as 
the reference image is then formed, the segmentation image 
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containing all the holons; however, the pixel intensity for 
each pixel within the holon is replaced by the specific 
holon niomber. This image is referred to as the segmentation 
or S field. 

5 Holon depth information is obtained by judging 

occlusions, perspective or any other depth clue, in order to 
arrange the holons according to depth. If there are several 
possible choices of depth orderings, e.g., if two holons in 
the sequence never occlude each other and appear to have the 

10 same depth, an arbitrary order is chosen. If no single 

depth ordering is possible, because the order changes during 
the sequence, e.g., holon A occludes holon B at one time 
while holon B occludes holon A at another time, one of the 
possible depth orderings is chosen arbitrarily. This depth 

15 ordering is then converted into a depth scale in such a way 
that zero corresponds to something infinitely far away and 
full scale corresponds to essentially zero depth, i.e., 
nearest to the camera. Depth scale may conveniently be 
specified or expressed using the intensity scale available 

20 in the drawing tool, such that infinitely far away objects 
are assigned an intensity of zero, and very close objects 
are assigned full scale intensity. Based on this depth 
ordering, an image is then formed having the same size as 
the reference image; however, each pixel value has an inten- 

25 sity value functioning as a depth value. This image is 
referred to as the Z field. 

Manual modelling or encoding also includes deter- 
mining holon opacity information. Opacity is determined by 
first forming an image that has maximum intensity value for 
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completely opaque pixels, zeros for entirely transparent 
pixels, and intermediate values for the remaining pixels. 
Typically, most objects will have the maximum value (maximum 
opacity) for the interior portion and a narrow zone with 
5 intermediate values at the edges to make it blend well with 
the background. On the other hand, shadows and reflections 
will have values at approximately half the maximum. This 
image which indicates opacity is referred to as the Prob 
field. 

10 Holon movement information is obtained by first 

determining the vertical and horizontal displacement, be- 
tween the reference image and the reference frame for each 
holon. This is carried out for selected, easily recogniz- 
able pixels of the holons . These displacements are then 

15 scaled so that no movement corresponds to more than half of 
the maximum intensity scale of the drawing tool. Darker 
intensity values correspond to vertically upward or horizon- 
tally- leftward movements. Similarly, lighter intensity 
values correspond to the opposite directions, so that maxi- 

20 mum movements in both directions do not exceed the maximum 
intensity value of the drawing tool. Two new images, one 
for the vertical and one for the horizontal dimension, 
collectively form the "first smile load", which is the same 
size as the reference image. The scaled displacements are 

25 then placed at the corresponding addresses in the first 

smile load, and the displacements for the remaining pixels 
are formed using manual or automatic interpolation. 

The first smile load should preferably be verified 
by preparing all of the above -described fields for use in 



• 
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the decoder, along with a table of score values (this table 
will is referred to as the "Time Series"). Next, the scores 
for the first smile factor are set to 1 for all holons which 
form part of a test frame, which is then decoded. The 
5 resulting decoded frame should provide good reproduction of 
the holons in their respective reference frame (except for 
blush effects, which have not yet been adressed) . If this 
is not the case, the cause of each particular error can 
easily be attributed to an incorrect smile score or load, 
10 which may be adjusted, and then the process repeated using 
the new values. This process correctly establishes how to 
move holons from the reference image position to the refer- 
ence frame position. 



15 be estimated. For each holon, a frame is selected where the 
holon has moved in an easily detectable manner relative to 
the decoded approximation of the reference frame, T^, which 
is referred to as an intermediate frame. The same procedure 
for determining the first smile load is carried out, except 

20 that now movement is measured from the decoded reference 

frcime to the selected new frame, and the resulting output is 
referred to as the "Second smile load. " These displacements 
are positioned in the appropriate locations in the reference 
image, and the remaining values obtained by interpolation. 

25 The smile scores for both the first and second smile loads 
for all holons are set to 1, and then the selected frame is 
decoded. The result should be a good reproduction of the 
selected frame (except for blush effects, which have not yet 
been adressed) . 



Next, the movement of holons between frames must 
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The movement for the remaining frames in the 



sequence is obtained by merely changing the smile scores 
using trial and error based on the already established smile 
loads. Whenever a sufficiently good reproduction of the 
5 movement cannot be found using the already established smile 
factors only, a new factor must be introduced according to 
the method outlined above. The displacement for selected 
features (pixels) between each decoded intermediate frame I^, 
and the corresponding frame in the original sequence is 
10 measured and the result stored in the reference image posi- 
tion. The remaining pixels are obtained by interpolation, 
and the final result verified and any necessary correction 
performed. 



15 factors has produced sufficiently accurate movement repro-. 
duct ion, blush factors may then be introduced. This may be 
performed automatically by working through each frame in the 
sequence,, and decoding each frame using the established 
smile factors, and calculating the difference between each 

20 decoded and the corresponding frame in the original 

secjuence. This difference is then moved back to the refer- 
ence position and stored. Singular value decomposition may 
then be performed for the differences represented in the 
reference position, in order to produce the desired blush 

25 loads and scores. 

Addition of nod factors 
Nod and smile factors may be combined in several 
ways, two of which will be discussed. In the first method, 



When the above process for calculating smile 
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movement can be described as one contribution from the smile 
factors and one contribution from the nod factors, with the 
two contributions being added together. In the second 
method, the pixel coordinates can first be smiled and then 
5 nodded . 

In the first method, i.e., additive nod and smile 
factors, the decoding process for one pixel in the reference 
image adds together the contributions from the different 
smile factors, and calculates the displacement due to the 

10 nod factors using the original position in the reference 
image. These two contributions are then added to produce 
the final pixel movement . 

In the second method, i.e., cascaded nod and smile 
factors, the decoding process first adds together the con- 

15 tributions from the different smile factors, and then ap- 
plies the nod factors to the already smiled pixel coordi- 
nates . 

The first method is somewhat simpler to implement, 
while the second method may produce a model which corre- 

20 spends more closely to the true physical inteirpretation of 
sequences where nod factors correspond to large movements of 
entire objects and smile factors correspond to small plastic 
deformations of large objects. 

The process of extracting smile factors can be 

25 extended to also include nod factors, which are used to 
represent movements of solid objects Oaf fine transforma- 
tions) . Essentially, nod factors are special situations of 
smile factors. Specifically, each time a new smile factor 
has been calculated for a holon, it can be approximated by a 
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nod factor. This approximation will be sufficiently accu- 
rate if the smile loads possess characteristics such that 
for verical and horizontal dimensions, movement of a pixel 
can be considered as a function of its vertical and horizon- 
5 tal position, which can be fitted to a specific plane 

through 3 -dimensional space. Nod factors essentially corre- 
spond to the movement of rigid objects. The approximation 
will be less accurate when the smile factors correspond 
instead to plastic deformations of a holon, 

10 To establish the nod loads, the smile loads are 

projected onto three "nod loads" of the same size as the 
extended reference image. The first nod load is an image 
where each pixel value is set to the vertical address of 
that pixel. The second nod load is an image where each 

15 pixel value is set to the horizontal address of that pixel.. 
Finally, the third nod load is an image consisting of all 
ones . 

In the case of a nod factor added to a smile 
factor, i.e., additive nod, the above procedure for extract- 

2 0 ing new smile factors may be utilized. However, for the 

case of a cascaded nod factor, i.e., encoding using first a 
nod factor and then a smile factor, one additional step must 
be perfoarmed in the encoding process. Whenever a new smile 
load is estimated based on an intermediate frame I„ which 

25 has been produced using nod factors, not only must the 
position in !„, of the displacement be mapped back to the 
reference image, but the actual displacements must also be 
mapped back using the inverse of the nod factor. In the 
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case of cascaded nod and smile, in the decoder, each frame 
is first "smiled" and then "nodded." 

DEEPENING NOD 

5 

In the general case of one nod factor per holon, 
the nod factors transmitted to the decoder consist of one 
set of nod parameters for each holon for each frame* Howev- 
er, there may be strong correlations between the nod parame- 

10 ters between holons and between frames. The correlations 

between holons may be due to the fact that the holons repre- 
sent individual parts of a larger object that moves in a 
fairly coordinated manner, which is however, not sufficient- 
ly coordinated to be considered a holon itself. In addi- 

15 tion, when the holons correspond to physical objects, there 
may also be correlations between frames due to physical 
objects exhibiting fairly linear movement. When objects 
move in one direction, they often continue moving at approx- 
imately the same speed in a similar direction over the 

20 course of the next few frames. Based on these observations, 
nod factors may be deepened. 

In the case of manual encoding, the operator can 
usually group the holons so that there is a common relation- 
ship among the holons of each group. This grouping is 

25 referred to as a superholon and the individual holons within 
such a group are referred to as subholons. This type of 
grouping may be repeated, whereby several superholons may 
themselves be subholons of a higher superholon. Both sub- 
holons and superholons retain all their features as holons. 
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In the case of automatic encoding, similar groupings can be 
established through cluster analysis of the nod transforms. 

The nod factors for the subholons of one super- 
holon may be separated into two components, the first compo- 
5 nent used to describe movements of the superholon and the 
second component used to describe movement of that individu- 
al sub-holon relative to the superholon* 

The deepening of the nod factors between frames 
includes determining relationships between frames for nod 

10 factors belonging to the same holon, be it a standard holon, 
superholon or subholon. This is accomplished by dividing 
the nod factors into a static part, which defines a starting 
position for the holon; a trajectory part, which defines a 
trajectory the holon may follow; and a dynamic part, which 

15 describes the location along the trajectory for a specific, 
holon in a given frame. Both the static and trajectory 
parts may be defined according to the reference image or to 
the nod factors of superholons. 

The deepened nod factors represent sets of affine 

20 transforms and may be represented as a set of matrices, see 
William M. Newman and Robert F. Sproull, Principles of 
Interactive Computer Graphics , page 57 (mCGraw Hill 1984) , 
which is incorporated herein by reference. The static part 
corresponds to one fixed matrix. The trajectory and dynamic 

25 parts correspond to a parameterized matrix, the matrix being 
the trajectory part and the parameter being the dynamic 
part, see Newman & Sproull, page 58, which is incoirporated 
herein by reference. These transforms may be concatenated 
together with respect to the relationships between the 
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Static, trajectory and dynamic parts. The transforms may 
also be concatenated together with respect to the combina- 
tions of several behaviors along a trajectory, as well as 
with respect to the relationships between superholons and 
5 subholons, see Newman & Sproull, page 58, which is incorpo- 
rated herein by reference. 

The above operations may be readily performed by a 
human operator utilizing: a method for specifying full 
affine transform matrices without parameters; a method for 

10 storing transform matrices with sufficient room for one 

parameter each specifying translation, scaling, rotation or 
shear; a method for specifying which transform matrices 
should be concatenated together in order to form new trans- 
form matrices; and a method for specifying which transform 

15 (which may be a result of concatenating several transforms) 
should be applied to each holon. 

Automatic Encoding 

In the case of automatic or semi-automatic encod- 

20 ing, the encoding process may be iterative, increasing the 
efficiency of the encoding with each iteration. An impor- 
tant aspect of automatic encoding is achieving the correct 
balance between intensity changes and address changes be- 
cause intensity changes may be modelled inefficiently as 

25 address changes and vice versa. Thus, in the modelling of 
the domains it is critical that the respective scores and 
residuals be estimated by a process which avoids inefficient 
modelling of intensity changes as address changes and vice 
versa. This is accomplished by building the sequence model 
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in such a way that blush modelling is introduced only when 
necessary and, making sure that the model parameters have 
applicability to multiple frames. A preferred embodiment 
involving full sequence modelling, and an alternative em- 
5 bodiment involving simplified sequence modelling, will be 
described herein. In the present description, the individu- 
al building blocks of the encoder will first be presented at 
a fairly high level, and then the operation and control of 
these building blocks will be described in more detail. 

10 Automatic Encoder Overview 

Automatic or semiautomatic encoding according to 
the present invention in the case of video sequence data 
will be described in detail with reference to Figures 6-13. 
Figure 6 is a block diagram of an encoder according to the 

15 present invention. Figure 7 is a block diagram of a model 
estimator portion of the encoder of Figure 6. Figures 8-10 
show details and principles of a preferred embodiment of the 
ChangeFieldEstimator part of the ModelEstimator . 
Figure 11 shows details of the 

20 Local ChangeFieldEstimator part of the ChangeFieldEstimator. 

Figure 12 outlines the Interpreter of the Model - 

Estimator. 

Figure 13 outlines the separate Decoder. 

25 

High Level Encoder Operation 

The input data (610) , which may be stored on a 
digital storage medium, consists of the video sequence x,^ 
with input images for frames n=l,2, — ,nFrames. This input 



wo 95/08240 




PCT/US94/10190 



49 

includes the actual intensity data ±^^, with individual color 
channels according to a suitable format for color represen- 
tation, e.g. [Rseq# Gseq# Bgeq] and some suitable spatial resolu- 
tion format. The input also consists of implicit or explic- 
5 it 2D coordinate address or location data a^ for the differ- 
ent pixels or pels. Thus, the video sequence for each 
frame consists of i^, and p„ information. 

Finally, may also consist of probabalistic 
qualities p^^q to be used for enhancing the IDLE encoding. 

10 These data consist of the following results of preprocessing 
of each frame: (a) Modelability, which is an estimate of the 
probability that the different parts of a frame are easily 
detectable in preceding or subsequent frames; (b) HeteroPel, 
which indicates the probability that the pels represent 

15 homogenous or heterogenous optical structures. 

The automatic encoder according to the present 
invention consists of a high-level Multipass controller 620 
and a ModelEstimator 630. The Multipass controller 620 
optimizes the repeated frame -wise estimation performed for a 

20 series of frames of a given sequence. The ModelEstimator 
630 optimizes the modelling of each individual video frame 
n. 

In the preferred embodiment, a full sequence model with 
parameters in the different domains is gradually expanded 
25 ("extended" and "widened") and refined {"deepened" or sta- 
tistically "updated") by including information from the 
different frames of a sequence. The full sequence model is 
further refined in consecutive, iterative passes through the 
sequence. 
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In contrast, in the alternative embodiment in- 
volving simplified modelling, a set of competing extra 
sequence models are developed in the different domains and 
over a number of different frames, in order to model the as 
5 yet unmodelled portion of the input frames It should be 

noted that the modelled portion of the input frames has 
been modelling using the established sequence model X^^f, 
Each of these competing extra models has parameters in only- 
one single domain. The number of frames (length of a pass) 

10 used to estimate parameters in each of the domains depends 
on how easily the frames are modelled. At the end of the 
pass in each domain, the full sequence model is then "wid- 
ened" or "extended" by choosing a new factor or segmentation 
from the competing extra domain model that has shown the 

15 best increase in modelling ability for the frames. This 

embodiment is described in detail in Appendix II SIMPLIFIED 
ENCODER. 

The ModelEstimator 630 takes as input the data for 
each individual frame (640) , consisting of [i,,, a^ and pj 
20 as defined above- It also takes as input, a preliminary, 
previously estimated model X^ef (650) as a stabilizing input 
for the sequence. As output, the ModelEstimator 630 deliv- 
ers a reconstructed version of the input image x^hat (660) 
and a corresponding lack- of- fit residual e^^Xn-x^hat (565) , 
25 plus an improved version of the model X^^f (655) . 

The ModelEstimator 630 may also input/output LocalMo- 
dels 670 for the data structures in the vicinity of frame n. 

Additionally, the ModelEstimator 630 may take as 
input pre-established model elements from an external Model- 
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Primitives data base 680, which may consist of spatial and 
temporal models of movement patterns, e.g. a hiiman face or 
body, running water, moving leaves and branches, and sim- 
pler modelling elements such as polyhedral object models 
5 (see David W. Murray, David A, Castelbw and Bernard 

Buxon, "FROM IMAGE SEQUENCES TO RECOGNIZED MOVING POLYHEDRAL 
OBJECTS", Intematl Journal of Computer Vision, 3, pp. 181- 
208, 1989, which is incorporated herein by reference. 

The ModelEstimator 630 also exchanges control 
10 information 635 and 637 from and to the Multipass Controller 
62 0. Details regarding the control parameters are not 
explicitly shown in the subsequent figures. 

Model Estimator 

15 A full implementation of the ModelEstimator 630 .of 

Figure 6 is shown in Figure 7 for a given frame n. The 
ModelEstimator 63 0 contains a ChangeFieldEstimator 710 and 
an Interpreter 720. The ChangeFieldEstimator 710 takes as 
primary input the data for the frame, (corresponding to 

20 640) (consisting of image intensity data i^, address informa- 
tion and probabilistic information p^) . It also takes as 
input, information from the preliminary version of the 
current spatial and temporal Model X^^f, JJ^^ 760 (correspond- 
ing to 650) existing at this point in time in the encoding 

25 process. The preliminary model information 760 is used to 
stabilize the estimation of the changefield image fields in 
the ChangeFieldEstimator 710, the change fields being used 
to change the intensity and other quantities of the prelimi- 
nary SequenceModel X^ctf^scq (760) of the extended Reference 
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image in order to approximate as close as possible the input 
image intensities, in. 

The ChangeFieldEstimator 710 also inputs various 
control parameters from the Multipass Controller 620 and 
5 exchanges local control information 755 and 756 with the 
interpreter 720 • 

As its main output, the ChangeFieldEstimator 710 
yields the estimated change image fields DX^^f^ (730) which 
are used to change the spatial and temporal parameters of 

10 the preliminary SequenceModel X^ef ^scq (760) of the extended 
Reference image in order to approximate, as closely as 
possible, the input image intensities, in* It also yields 
preliminary model-based decoded (reconstructed) versions of 
the input image, x^hat (640) and the corresponding lack-of- 

15 -fit residuals (645) . 

The ChangeFieldEstimator 710 also yields local 
probabilistic quantities w„ (750) , which contain various 
waimings and guidance statistics for the sxibsequent Inter- 
preter 720. Optionally, the ChangeFieldEstimator 710 inputs 

20 and updates local models 670 to further optimize and stabi- 
lize the parameter- estimation process. 

The Interpreter 720 determines the estimated 
change image fields DX^^f^^, 73 0 as well as the preliminary 
forecast x^hat and residual e^, plus the estimation warnings 

25 750 and control parameters output from the Multipass 

Controller 620. Optionally, the Interpreter 720 receives 
input information from the external data base of model 
primitives, 780. These model primitives are of several 



wo 95/08240 




PCTAJS94/10190 



types: Sets of spatial loadings or temporal score series 
previously estimated from other data may be included in 
present IDLE model in order to improve compression or model 
functionality. One example of usage of spatial loading 
5 models is when already established general models for mouth 
movements are adapted into the modelling of a talking 
person's face in picture telephone encoding. Thereby a wide 
range of mouth movements become available without having to 
estimate and store/ transmit the detailed factor loadings; 

10 only the parameters for adapting the general mouth movement 
loadings to the present person's face need to be estimated 
and stored/ transmit ted. 

Similarly, including already established movement 
patterns into an IDLE model is illustrated by using pre- 

15 estimated score time series for the movement of a walking, 
and a running person in video games applications. In this 
case the pre-established scores and their corresponding 
smile loadings must be adapted to person (s) in the present 
video game reference image, but the full model for walking 

20 and running people does not have to be estimated. 

A third example of the use of model primitives is 
the decomposition of the reference image into simpler, pre- 
defined geometrical shapes (e.g. polygons) for still image 
compression of the reference model X^^f* 

25 The Interpreter subsequently modifies the contents 

of the SequenceModel 760 and outputs this as an updated 
sequence SequenceModel (765) , together with a modified 
model -based decoded version of the input image, x^hat (770) 
and the corresponding lack- of -fit residual (775) . upon 
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convergence (determined in the Multipass Controller 620) 
these outputs are used as the outputs of the entire ModelE- 
stimator (630) . 

5 Change Field Estimator 

Figure 8 is a block diagram representation of a 
ChangeFieldEstimator 710 according to a preferred embodiment 
of the present invention. As shown in Figure 8, an input 
frame x^, which has been converted into the correct format 

10 and color space used in the present encoder, is provided to 
the ChangeFieldEstimator 710. The SequenceModel X^^f (760), 
in whatever form available at this stage of the model esti- 
mation, is also input to the ChangeFieldEstimator 710. The 
main output from the ChangeFieldEstimator 710 is the change 

15 image field BX^^f^ (890) which converts the SequenceModel X^^f 
810 into a good estimate of the input frame x„. 

The ChangeField Estimator 710 may be implemented 
in either of two ways. First, in the preferred embodiment, 
the change fields are optimized separately for each domain, 

20 and the optimal combination determined iteratively in the 
Interpreter 720. Alternatively, the change fields may be 
optimized jointly for the different domains within the 
ChangeField Estimator 710. This will be described in more 
detail below. 

25 Additional outputs include the preliminary esti- 

mate, x^hat (892) the difference between the input and pre- 
liminary estimate, (894) , together with waimings (896) . 



Forecasting position m 
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For both computational and statistical reasons, it 
is important to simplify ttie estimation of the change field 
as much as possible. In the present embodiment of the 
change field estimator, this is accomplished by forecasting 
5 an estimate which should resemble the input frame x^, and 
then only estimating the local changes in going from x^^ to x„ 
in order to represent each input frame more accurately. 

As will be described in more detail below, the 
ChangeFieldEstimator 710 of the present preferred embodi- 

10 ment, initially utilizes an internal Forecaster 810 and 
Decoder 830 to forecast an estimate, termed 835, to 
resemble the input frame x^. The Forecaster (810) receives 
as input the temporal SequenceModel Us^q (811) and outputs 
forecasted temporal scores (815) which are then input to 

15 the Decoder (830) . The Decoder 830 combines these scores 

with the spatial sequence model Xr^ 831, yielding the desired 
forecasted frame x„ (835) . Additional details regarding the 
decoder are set forth below. 

20 

Estimating local change field from m to 

input frame n 

Next, a LocalChangeFieldEstimator (850) is em- 
ployed to estimate the local change field needed to go from 
25 the forecasted x^^ to the actual input frame x„. This change 
is referred to as the estimated local change field dx^^, 
(855) , and contains information in several domains, mainly 
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movement and intensity change, as will be discussed in 
detailed below. 

In the estimated local change field dx„„, the data 
on how to change the content of the forecast x„ are given 
5 for each pixel in the "m position", i.e. in the position 
where the pixel is positioned in the forecasted frame x^. 
In order to be able to model these new changefield data 
together with corresponding changefield data obtained previ- 
ously for other frames, it is important to move the change - 

10 field data for all frames to a common position. In the 

present embodiment, this common position is referred to as 
the Reference position, or reference frame X^cf. This move- 
ment back to the common reference position will be described 
below. Note that capital letters will be used to designate 

15 data given in this reference position of the extended 

reference image model, while lower-case letters will be used 
for data given in the input format of image and approxi- 
mations of the input image x^. 

An auxiliary output from the Decoder 830 is the 

20 inverse address change field, da^j^cf that allows a Mover 

operator 870 to move the obtained local change field infor- 
mation dXjnn from being given in the m position back to the 
common Reference position. This moved version of dx^ output 
is referred to as DX^^r^ 875, with capital letters denoting 

25 that the information is not given in the reference position. 
The local ChangeFieldEstimator 850 may also re- 
ceive the full model X^^f, moved to the m position (Xn^f^g^ni 836) , 
plus correspondingly moved versions of DX^^f^ 825, and the 
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return smile field da^jR^f 865 as inputs (not shown) from the 
Decoder 830, for use in internal stabilization of the param- 
eter estimation for dx^ 835. 

5 Estimating the full change field for frame n 

The next step in the encoding process is to deter- 
mine the full estimated change field in going from the 
Reference position to the estimated position of input frame 
n. This is accort^lished by presenting the change field DXR^f^ 
10 originally used for transforming X^^f to x^^ to Adder 880 

together with the obtained DX^^^R^f, yielding the desired main 

output, DXRrf^„. 

Illustration of local change estimation 
15 The use of the forecasted position m, which has • 

been described above, is illustrated conceptually in Figure 
9 for the case of an address change DA for a given pel in an 
image representing a moving object. The determination of 
DAr^j^, (as part of the change field DXr^^^) is represented as 
20 element 902 in figure 9. The estimation of DAr^^^, is a four 
stage process. 

The first step is to determine the forecast change 
field that moves spatial information from the Reference 
position to the forecasted m position, resulting in an 
25 approximation of the input frame n. This is based on the 
address change field DAR^f^^ (904) represented by the vector 
from point Ref to point m. This vector is determined by 
forecasting, and is a part of DXr^^^. 
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Second, the local movement field from the fore- 



casted position m to the actual input frame # n, da.^ (926) , 
is determined. 

Third, the estimated result da^ is "moved" or 
5 translated back from the m position to the Reference posi- 
tion, using the inverse movement field daRef^m (905) (i.e., the 
vector from the m position to the Reference position) , thus 
yielding DA^^^f (936) . 

Finally, the two fields given with respect to the 

10 Reference position Ref, i.e., I>I^cf^ and B^ar^M are added to 
yield the desired DJ^^^f^ (946) . 

Thus, the function of the mover 870 is to "move" 
the local change field da^ back to the reference image model 
position Ref. Thus, all the elements in dx^^ (di^, da.^^ and 

15 dpnin) are thus moved back to the Ref position. The output of 
mover 870 is DX^^^^f (875) , which is the local change informa- 
tion in going from the forecasted frame m to the input frame 
n, but positioned with respect to the Reference position 
Ref. The change information is "moved" back to the refer- 

20 ence position Ref in order to ensure that change information 
obtained from frame n about a given object is positioned 
together with change information obtained from other frames 
about the same object. By positioning all information about 
an object in the same pel position, it is possible to devel- 

25 op simple models of the systematic changes in the sequence, 
In this way, the system attempts dynamically to improve the 
initial estimation of input frames. In the case where the 
address change field DA^ef^ (904) is defined to be all zeros. 
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the LocalChangeFieldEstimator 850 has to estimate the full 
change field DA^cf^ directly as da^. This may for example 
take place at the beginning of an encoding process, and for 
frames n, close to the frame used for initializing the 
5 reference image model. 

It should be noted that the local probabilistic 
change information dpn^ contains extra dimensions containing 
statistical descriptions of the performance of the Local 
ChangeField Estimator (850) . For these dimensions, the 

10 corresponding change field in DA^^f^ is considered as being 
empty. These additional dimensions are used by the Inter- 
preter (720) for encoding optimization. These dimensions 
may, for exanple, reflect possible folding or occlusion 
problems causing x^^ to have lost some of X^^^f ' s spatial infor- 

15 mation needed to estimate input frame x^, as well as spatial 
innovations in x^ needed to be included into X^^f at a later 
stage. 

The LocalChangeFieldEstimator (850) also outputs 
an estimate of the input frame, x^hat (892) , the lack- of -fit 
20 residual (894) and certain interpretation warnings 

(896) . These are also passed on to the Interpreter (720) 
where they are used for encoding optimization. 

The input and output of Local Model information 
(899) for the LocalChangeFieldEstimator will be discussed in 
25 detail below. 

Change Field Estimator 

The Local Change Field Estimator 850 of Figure 8 
is shown in more detail in Figure 10, with each domain I, A 
and P illustrated separately. It should be noted that each 
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of these domains again contains subdomains (e*g. R, G, B in 
I; V, H, Z in A) . For purposes of simplicity, these are not 
illustrated explicitly. 

Referring now to Figure 10, which is a more de- 
5 tailed illustration of the main parts of the Change Field 
Estimator of Figure 8, the available temporal score esti- 
mates for the sequence are used in the Forecaster 1010 to 
yield forecasted factors or scores for frame m in the three 
domains: Intensity (ul„) , Address (u2i^) and Probabilities 
10 (uP^) . 

Internal decoder portion of encoder 

Change F i e 1 dMake r 

15 The internal decoder portion of the encoder in- 

cludes ChangeField Maker 1020, Adder 1030 and Mover 1040 
which operate on their associated input, output and internal 
data streams • In the first stage (change field maker) of the 
decoder portion internal to the encoder, the factors or 

20 scores are combined with the corresponding spatial factor 

loadings available in the (preliminary) spatial model X^^f in 
the ChangeField Maker 1020 to produce the forecast change 
fields. For each domain I, A and P, and for each of their 
subdomains, the estimated factor scores and factor loadings 

25 are multiplied and the result accumulated, yielding the 
forecast change fields DlR^f^, DAr^^, DP^^f^. 

For simplicity, the additional functionality of hard 
modelling is not included in figures 8 and 10 for the inter- 
nal decoder portion of the encoder. This will instead be 
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discussed below in conjunction with the separate Decoder 
Figure 13 together with various other additional details, as 
the separate Decoder is essentially identical to the present 
internal decoder portion of the encoder. 



Adder 

In the second stage (adder) of the decoder, the 
change fields are added to the corresponding basic (prelimi- 
10 nary) spatial images in Adder 1030, i.e., the extended 

reference image intensities iRefCO) (e.g. RGB), the (implicit) 
extended reference image addresses Ar^CO) (e.g. VHZ) and the 
extended reference image probabilities Pjicf(O) (e.g. opacity) . 
This results in I^Ref/ Kxmcf and P^^oRcf 

15 

Mover 

The forecast change fields are transformed in 
Mover 1040 in accordance with the movement field DAR^f„, (904 
in Fig.9) , yielding the forecasted intensity image (e.g. 
20 in RGB), forecasted address image (e.g. VHZ) and fore- 
casted probabilistic image -p^ (e.g. opacity) . Together, 
these forecasted data portions form the forecast output x„ 
(835 in figure 8) from decoder 830 of Figure 8. 

25 Local ChancreField Estimator 

The Local ChangeField Estimator (850) estimates 
how to change the forecasted image generated in the 
Decoder 830, in one or more domains, primarily the intensity 
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domain, in order to accrately approximate the input frame, 
x„. The resulting estimated changes are referred to as the 
Local Change Fields dx^. 

The sequence model loadings, moved from the refer- 
5 ence position to the forecasted position, x^^mm 837 may be 
used as input for statistical model stabilization. In 
addition, a Local Models 899 may be used to stabilize this 
estimation. The Local Models may be a special case model 
optimized for a particular subset of frames. 

10 

Separate versus joint domains in change field 

estimation 

In the case of joint domain estimation of the 
local change fields in the ChangeField Estimator 710, some 

15 m-n deviations are attributed to intensity difference di^, 
while some m-n deviations are instead attributed to move- 
ments da^, and additional m-n deviations attributed to 
segmentation and other probabilistic differences dp^. The 
ChangeField Estimator 710 then requires internal logic and 

20 iterative processing to balance the different domains so 
that the same m-n change is not modelled in more than one 
domain at the same time. Since the resulting local change 
field dXjju, already contains the proper balance of the contri- 
butions from the different domains, this simplifies the 

25 remaining portion of the encoding process. 

However, when dealing with joint local change 
field domains, the Local ChangeField Estimator 850 must make 
iterative use of various internal modelling mechanisms in 
order to balance the contributions from the various domains. 
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Since these internal mechanisms (factor- score estimation, 
segmentation) are already required in the Interpreter (to 
balance the contributions of different frames) , the pre- 
ferred embodiment instead employs separate modelling of the 
5 various change field domains in the Local ChangeField Esti- 
mator 850 • This results in a much simpler design of the 
Local ChangeField Estimator 850. However, the encoding 
process must then iterate back and forth between the . 
ChangeField Estimator 710 and the Inteirpreter 720 several 
10 times for each frame n, in order to arrive at an optimal 

balance between modelling in the different domains for each 
frame. The forecasted frame is thus changed after each 
iteration in order to better approximate x^^, and the 
incremental changes in the different domains are accumulated 
15 by the Interpreter 720, as will be described below. 

Local ChangeField Estimator using separate 
domain modelling 

The primary purpose of the LocalChangeField Esti- 
mator 850, shown in detail in Figure 11, is to estimate 
20 using the forecasted frame x„ 1101 and input frame 1102, 
the local change fields dx^^ 1103, used in going from the 
forecasted frame m to the input frame n. 

The Local ChangeFieldEstimator 850 employs sepa- 
rate estimation of the different domains. An estimator, 
25 EstSmile 1110, estimates the local address change fields 
(smile fields) da^^n 1115, while a separate estimator, 
EstBlush 1120, estimates the local intensity change fields 
(blush fields) di^ 1125. Either of these estimators may be 
used to estimate the probabilistic change fields dp^ 1126. 
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The embodiment of Figure 11 illustrates the case where the 
probabilistic change fields are estimated by the EstBlush 
estimator 1120. 

In addition, both estimators 1110 and 1120 provide 
5 approximations, 1112 and 1114 respectively, of the input 
data, residuals and warnings. The warnings are used for 
those image regions that are difficult to model in the given 
estimator. The output streams 1112 and 1114 from the two 
estimators are then provided as two separate sets of output 
10 approximations, x^^hat, residuals ex^ and warnings w^^^. 



EstSmile 1110 motion estimator 
The EstSmile 1110 motion estimator estimates the 
local address change field da^ primarily by comparing the 

15 forecasted intensity i^, to the actual input intensity i^ 

using any of a number of different comparison bases, e.g., 
sum of absolute differences or weighted sum of squared 
differences, A variety of motion estimation techniques may 
be used for this purpose, such as the frequency domain 

20 techniqes described in R.C. Gonzales and R.E, Woods, Digital 
Imacre Processing , pp. 465-478, (Addison-Wesley, 1992), which 
is incorporated herein by reference, or methods using cou- 
pled Markov random field models as described in R. Depommier 
and E. Dubois, MOTION ESTIMATION WITH DETECTION OF OCCLUDED 

25 AREAS , IEEE 0-7803-0532-9/92, pp. III269 - III272 , 1992, which 
is incorporated herein by reference. 

The preferred embodiment according to the present 
invention utilizes a motion estimation technique which seeks 
to stabilize the statistical estimation and minimize the 
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need for new spatial smile loadings by using model informa- 
tion already established, * The spatial model structures, 
moved from the reference position to the m position, x^cmm 
one such type of model information. This type of model 
5 information also includes the moved version of the estimated 
weights Wgts^XR^f, as will be described in greater detail 
below. 

The probabilistic domain PRcfOm includes segment 
information SRef(2to which allows the pixels in the area of 

10 holon edges to move differently from the interior of a 
hoi on. This is important in order to obtain good motion 
estimation and holon separation when two holons are adjacent 
to each other. The EstSmile estimator 1110 itself may find 
new local segments which are then passed to the Interpreter 

15 720 as part of the warnings or probabilistic properties 
dPmn- Local segments are generally sxib- segments or portions 
of a segment that appear to move as a solid body from the 
forecasted frame m to frame n. 

The address domain contains spatial address factor 

20 loadings B^it)^^^^^, f=0,l,2,.,, in each coordinate sub-operand 
and for each holon. The motion estimation seeks preferably 
to accept motion fields da^m that are linear combinations of 
these already reliably established address factor loadings. 
This necessitates the use of an internal score estimator and 

25 residual changefield estimator similar to those used in the 
Interpreter 720. Temporal smoothness of the scores of frame 
n vs. frames n-1, n+1 etc, may then be imposed as an addi- 
tional stabilizing restriction. 
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The motion estimation may also include estimation 
of "hard" nod factors for the different segments. These 
segments may be the whole frame (for pan and zoom estima- 
tion) , the holons defined in the forecast s^, or they may be 
5 new local segments found by the motion estimation operator 
itself • 

The input uncertainty variances of the intensities 
and addresses of the various inputs, x^, x^, 

^Rcf®m 9-^© used in 
such a way as to ensure that motion estimation based on 

10 uncertain data are generally overridden by motion estimation 
based on relatively more certain data. Likewise, motion 
estimates based on pixel regions in the forecasted frame 
or input frame previously determined to be difficult to 
model, as judged e.g. by p^, are generally overridden by 

15 motion estimates from regions judged to be relatively easier 
to model. 

During the initial modelling of a sequence, when 
no spatial model structures have as yet been determined, and 
when the extracted factors are as yet highly unreliable, 
20 other stabilizing assiamptions , such as spatial and temporal 
smoothness, are afforded greater weight. 

The EstSmile 1110 estimator may perform the motion 
estimation in a different coordinate system than that used 
in the rest of the encoder, in order to facilitate the 
25 motion estimation process. 



EstBlush 1120 intensity change estimator 
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The EstBlush estimator 1120 estimates the local 
incremental blush field di^^, which in its simplest version 
may be expressed as : 

It should be noted that during the iterative improvement of 
the estimated change fields for a given frame, it is ex- 
tremely important that the blush field used for reconstruct- 
ing the forecasted frame x^^ in the Decoder 830 in a certain 
iteration, be not just based on di^ = in-im from the previous 
iteration, since this would give an artificially perfect fit 
between the forecasted frame m and input frame n, thus 
prematurely terminating the estimation process for better 
smile and probabilistic change fields. 

The EstBlush estimator 1120 also detects local 
changes in the probabilistic properties, dp^, by detecting, 
inter alia, new edges for the existing holons. This may be 
based on local application of standard segmentation tech- 
niques • Changes in transparancy may also be detected, based 
on a local trial -and -error search for minor changes in the 
transparancy scores or loadings availsible in PRcf@m which 
improve the fit between i^, and i„, without requiring further 
blush or smile changes. 

Reverse Mover 

The estimated local change fields (corresponding 
to dx^ 855 in Figure 8) are "moved" back from the forecasted 
position m to the reference position Ref in the Reverse 
Mover 1060, using the return address change field from m to 
Ref, dajjjR^f, from the Decoder Mover 870. These outputs 
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DI,nn@Rcf/ DA^^Ref axid DP^^r^^, correspond to DA^^^^ 90 8 in Figure 
9 and DX^^r,; in Figure 8. 

Reverse Adder 

5 Finally, DX^R^f is added to the original forecast- 

ing change fields, DXr^^ [DlRef,„, DAr^.^^^ and DPR^f^] in the 
Reverse Adder 1070, to yield the desired estimated change 
fields which are applied to the reference model 

^Rcf esti- 
mate input frame n, x^. These change fields of DXr^^^ are 
10 DlR,f^, DAr^^ and DPR^f^. 

The Local ChangeFieldEstimator 1050 also yields 
residuals and predictions corresponding to (894) and x^hat 
(892) in the various domains, as well as various other 
statistical warnings w„ (896) in Figure 8. 

15 

Interpreter 

Interpreter Overview 
The main purpose of the Interpreter 720 is to 
extract from the estimated change field and other data for 
2 0 the individual frames, stable model parameters for an entire 
sequence of data or portion of a sequence. The Interpreter 
720 in conjunction with the Change Field Estimator 710, is 
used both for preliminary internal model improvement, as 
well as for final finishing of the model. In the case of 
25 video coding, the Interpreter 72 0 converts change field 

information into spatial, temporal, color and other model 
parameters in the address, intensity and probabilistic 
domains • The Interpreter 720 and the Change Field Estimator 
710 are repeatedly accessed under the control of the Multi- 
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Pass Controller 620 for each individual frame n, for each 
sequence of frames and for repeated passes through the 
sequence of frames , 

For a given frame n at a given stage in the encod- 
5 ing process, the Interpreter 720 takes as input the estimat- 
ed change fields in the various domains, DXr^„ 730 (including 
uncertainty estimates) as well as additional warnings 750 
from the ChangeField Estimator 710 ♦ The Interpreter also 
receives preliminary coded data for individual frames, x^hat 

10 (735) , and residual error (745) from the Change Field 
Estimator 710. The Interpreter 720 also receives existing 
models {XRrf,Uscq} 760, and may optionally receive a data base 
of Model Primitives 780 for model deepening, in addition to 
local model information 899 and Local Change Field Esti- 

15 mates dx^ and the input frame information The Inter- 

preter 720 also receives and returns control signals and 
parameters 635 and 637 from and to the Multipass Controller, 
and 755 and 756 to and from the ChangeField Estimator 710. 

The Interpreter 720 processes these inputs and 

20 outputs an updated version of the model {XR^ffUs^} 765. The 
changes in this model may be spatial extensions or redefini- 
tions of the holon structure of the reference image mod- 
el (s) , widened sub- operand models, or new or updated values 
of the factor loadings X^^f and sequence scores Uscq. The 

25 Interpreter 720 also outputs scores in the various domains 
and sub -operands (772) for each individual frame n, as 
well as a reconstructed frame x^hat (770) and residuals 
€^(775) . It should be noted that all of the Interpreter 
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outputs are expressed as both a signal value and its associ- 
ated uncertainty estimate • 

The internal operational blocks of the Interpreter 
720 are shown in detail in Figure 12. Referring now to 
5 Figure 12, the Interpreter 720 includes a Score Estimator 
1202 which estimates the scores u„ (1204) of factors with 
known loadings for each holon and each sub-operand. The 
Interpreter 720 also estimates the matrix of nod scores 
corresponding to affine transformations, including scores 

10 for moving and scaling the entire frame due to camera pan 

and zoom motions. These scores are provided to the Residual 
Change Estimator 1210 which subtracts out the effect of 
these known factors from the Change Field input DX^^f^, to 
produce the residual or unmodelled portion EX^ (1212) . The 

15 residuals 1212 (or the full Change Field DX^gf^n, depending on 
the embodiment) are then used by the Spatial Model Widener 
1214 in order to atteirqpt to extract additional model parame- 
ters by analyzing these change field data obtained from 
several frames in the same sequence. Since all of the 

20 change fields from the different frames in the subsequence 
have been moved back to the reference position as described 
above, spatio-temporal change structures that are common to 
many pixels and frames may now be extracted using factor 
analysis of these change field data. New factors, which are 

25 considered to be reliable as judged by their ability of 

describe xinmodelled changes found in two or more frames, are 
used to stabilize the change field estimation for subsequent 
frames. In contrast, minor change patterns which affect 
only a small number of pixels and frames are not used for 
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Statistical stabilization, but rather, are accumulated in 
memory in case they represent emerging change patterns that 
have not yet fully emerged but will become statistically 
significant as more frames are brought into the modelling 
5 process. 

The Spatial Model Widener 1214 also handles additional 
tasks such as 3D sorting/structure estimation and assessment 
of transparency and shadow effects. The scores 1215 are 
also provided to the Temporal Model Updater 1206 and Spatial 

10 Model Updater 1208, where they are used for statistical 

refinement, simplification and optimization of the models. 

In the Interpreter 720, the input sequence Xj is 
also provided to the Spatial Model Extender 1216 which 
carries out various segmentation operations used to extract 

15 new spatial segments from each individual frame n. The 
Spatial Model Extender 1216 also merges and splits image 
segments in order to provide more efficient holon struc- 
tures. The input sequence is also provided to the Model 
Deepener 1218 which attempts to replace model parameters in 

20 various domains by equivalent model parameters, but in more 
efficient domains. This may, for example, include convert- 
ing- "soft" modelling factors such as smile factors into 
"hard" nod factors, which require less explicit information. 

Detailed description of Interpreter opera- 

25 tional blocks 

The Score Estimator 1202 estimates the scores of 
each individual frame n, u^, in the various domains 
(operands) and sub-operands for the various holons for use 
with factors having known loadings in X^^f. Each score con- 
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tains a value and associated estimation uncertainty. Robust 
statistical estimation is used in order to balance the 
statistical noise stabilization (minimization of erroneous 
score estimation due to noise in the loadings or input 
5 data) , versus statistical robustness (minimizing erroneous 
score estimation due to outlier pixels, i.e., those pixels 
with innovation, i.e., change patterns not yet properly 
. described using the available spatial model.) Detection of 
outliers is described in H. Martens and T. Naes, Multi- 

10 variate Calibration , pp 267-272, (John Wiley & Sons, 1989), 
which is incorporated herein by reference. Statistical 
stabilization to minimize noise is achieved by combining the 
impact of a larger number of pixels during the score estima- 
tion. Statistical stabilization to minimize the effect of 

15 outlier pixels is achieved by reducing or eliminating the 
impact of the outlier pixels during the score estimation. 
In a preferred embodiment, the robust estimation technique 
is an iterative reweighted least squares optimization, both 
for the estimation of smile, blush and probabilistic scores 

20 of "soft models" with explicit loadings as well as for the 
nod score matrices of the affine transformations of solid 
objects. 

Two different approaches to score estimation may 
be used. The first approach is a full iterative search in 
25 the score -parameter space to optimize the approximation of 

the input image Xq. The second approach is a simpler projec- 
tion of the estimated change fields DXR^f^ onto the known 
factor loadings (including the explicit loadings in Xr^f s-iicl 
the implicit loadings associated with nod affine transforma- 
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tions) . In addition, combinations of both methods may be 
used. 

In the case of the iterative search in the score- 
parameter space, nonlinear iterative optimization is used to 
5 find the combinations of scores ti„ in the different domains 
(operands) , sub-operands, holons and factors that result in 
optimal decoding conversion of the model X^^f into estimate 
a^hat. The optimization criterion is based on the lack of 
fit difference {x^ - Xj,hat) , mainly in the intensity domain. 

10 A different set of one or more functions may be used in 
order to optimize the fit for individual holons or other 
spatial subsegments. These function(s) indicate the lack of 
fit due to different pixels by calculating, for example, 
absolute or squared differences. The different pixel con- 

15 tributions are first weighed and then added according to the 
reliability and importance of each pixel. Thus, outlier 
pixels are assigned a lower weighting, while pixels that 
correspond to visually or estimationally important lack of 
fit residuals are assigned a higher weight. 

20 The search in the score-parameter space may be a 

full global search of all factor scores, or may instead 
utilize a specific search strategy. In a preferred embodi- 
ment, the search strategy initially utlizes score values 
predicted from previous frames and iterations. In order to 

25 control the computational resources required, the optimiza- 
tion may be performed for individual spatial subsegments 
(e.g., for individual holons), at different image resolu- 
tions (e.g., low resolution images first) or different time 
resolutions, e.g., initially less than every frame, or for 
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different color channel representations (e.g., first for 
luminosity, then for other color channels) . It should be 
noted that more emphasis should be placed on estimating 
major factors with reliable loadings, and less emphasis on 
5 minor factors with less reliable loadings • This may be 
controlled by the Score Ridge parameter from the Multipass 
Controller which drives unreliable scores toward zero. 

Score estimation by projection of the estimated 
change field DXR^f^n on 'known' loadings in X^ef does not re- 
10 quire any image decoding of the reference model. Instead, 
statistical projections (multivariate regressions) of the 
obtained change field DXr^^ (regressands) on known loadings 
in XRcf (regressors) are used. The regression is carried out 
for all factors simultaneously within each domain's sxib- 
15 operand and for each holon, using least squares multiple 

linear regression. If the weights of the different pixels 
are changed, e.g., for outlier pixels, or the regressor 
loadings become highly non - orthogonal , then a reduced rank 
regression method is preferably used. Otheirwise, the sta- 
20 tistical modelling becomes highly unstable, especially for 
intercorrelated factors with low weighted loading contribu- 
tions. In a preferred embodiment, the regression is per- 
formed using standard biased partial least squares regres- 
sion (PLSR) or principal component regression (PGR) , as 
25 outlined in detail in H. Martens and T. Naes, Multivariate 
Calibration , pp 73-166, (John Wiley & Sons, 1989), which is 
incorporated herein by reference. 

Other robust regression techniques, such as purely 
non-metric regressions or conventional ridge regressions. 
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utilizing a ridge parameter, (H. Martens and T. Naes, Multi- 
variate Calibration , pp 230-232, (John Wiley & Sons, 1989), 
which is incorporated herein by reference, may be used. The 
ridge parameter serves to stabilize the score estimation of 
5 minor factors. Ridging may also be used to stabilize the 
latent regressor variables in the PLSR or PGR regression. 
Alternatively, the scores may be biased towards zero by 
controlling the ScoreRidge parameter from the Multipass 
Controller so that only major factors are used in the ini- 

10 tial estimation process for the Change Field stabilization. 
The uncertainties of the scores may be calculated using 
standard sensitivity analysis or linear model theory, as 
discussed in H. Martens and T. Naes, Mul t ivar ia t e Cal ibra - 
tion, pp. 168, 206, (John Wiley & Sons, 1989), which is 

15 incorporated herein by reference. 

Residual Change Field Estimator 
The Residual Change Field Estimator 1210 deter- 
mines the remaining umodelled residual EX^cf^ by removing the 
effects of the various scores which were estimated in the 

20 Score Estimator 1202 from the respective changefields DX^^f^ 
for the various sub- operands and holons. In the preferred 
embodiment, the effects of the factors (e.g. the sum of 
available loadings multiplied by the appropriate scores) are 
simply sxibtracted from the change fields. For example, in 

25 the case of red intensity: 

ERr,,^ = DRR,f^ - (R(0)j,,,*uR(0)^ + R(l)R,f*uR(l)^ + 
) 

Optionally, the model parameters used in this residual con- 
struction may be quantized in order to make sure that the 
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effects of quantization errors are fed back to the encoder 
for possible subsequent correction. 

Spatial Model Widener 
The Spatial Model Widener 1214 of the Interpreter 

5 accumulates the residual change fields EXReu fo^ frame n 
along with the unmodelled residuals from previous frames. 
These residual change fields represent as yet unmodelled 
information for each holon and each operand (domain) and 
sub- operand. These residuals are weighted according to 

10 their uncertainties, and statistically processed in order to 
extract new factors. This factor extraction may preferably 
be accomplished by performing NIPALS analysis on the weight- 
ed pixel -frame matrix of unmodelled residuals, as described 
in e.g. H. Martens and T. Naes, Multivar iate Calibration, pp 

15 97-116 and p. 163 (John Wiley & Sons, 1989), which is incor-. 
porate herein by reference, or on their frame by frame 
crossproduct matrix, see H. Martens and T. Naes, 
Multivariate Calibration , p. 100 (John Wiley & Sons, 1989) , 
which is incorporated herein by reference. However, this 

20 iterative NIPALS method does not necessarily have to iterate 
to full convergence for each factor. Alternatively, the 
factor extraction from the weighted pixel -frame matrix of 
unmodelled residuals may be attained using singular value 
decomposition, Karhunen-Loeve transforms, eigen analysis 

25 using Hotelling transforms, such as are outlined in detail 
in, e.g., R.C-Gonzales and R.E.Woods, Digital Image Process- 
ing , pp 148-156, (Addison-Wesley 1992), which is incorporat- 
ed herein by refemce, and Carlo Tomasi and Takeo Kanade, 
SHAPE AND MOTION WITHOUT DEPTH > IEEE CH2934-8/90 pp. 91-95, 
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1990, which is incorporated herein by refernce. The signif- 
icant change structures in the resulting accxunulated residu- 
al matrix are extracted as new factors and included as part 
of the model CXRcf/Useq] • Change structures which involve 
5 several pixels over several frames are deemed to be signifi- 
cant. The Spatial Model Widener portion of the Interpreter 
may be used for both local models 670, as well as more 
complete sequence or subsequence models 650. 

In the case of real time encoding, the effect of 

10 the remaining unmodelled residuals from each individual 

frame may be scaled down as time passes, and removed from 
the accumulation of unmodelled residuals if they fall below 
a certain level. In this way, residuals remaining for a 
long time and not having contributed to the formation of any 

15 new factors are essentially removed from further consider- . 
ation, since statistically, there is a very low probability 
that they will ever contribute to a new factor. In this 
embodiment, the Spatial Model Widener 1214 produces indi- 
vidual factors that may be added to the existing model. 

20 Subsecfuently, this new set of factors, i.e., model, may be 
optimized in the Temporal Updater 1206 and Spatial Model 
Updater 1208, under the control of the Multipass Controller. 

In an alternative embodiment , the existing model 
25 is analyzed together with the change fields in order to 
generate a new model. This new model preferably includes 
factors which incorporate the additional information from 
the newly introduced change fields. Essentially, the entire 
model [XRgf, Uscq] is re -computed as each new frame is intro- 
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duced. This is preferably accomplished using loadings X^^f 
and scores Uscq which are scaled so that the score matrix Us^q 
is orthonormal, (see H. Martens and T. Naes, Multivariate 
Calibration , p. 48, (John Wiley & Sons, 1989), which is 
5 incorporated herein by reference. The different factor 
loading vectors in X^^f then have different sums of squares 
reflecting their relative significance- The new loadings 
[Xj^cf] (new) are then generated using factor analysis, e,g., 
singular value decomposition svd, of the matrix consisting 

10 of [XRef(old) , DX^cf^] . This is a simplified, one-block svd 

based version of the two-block PLSR-based updating method 
described in H* Martens and T. Naes, Multivariate Calibra- 
tion , pp. 162-123, (John Wiley & Sons, 1989), which is 
incorporated herein by reference- New scores corresponding 

15 to the new loadings are also obtained in this process. 
Three-dimensional depth estimation 

The Spatial Model Widener 1214 may also be used to 
estimate the approximate three dimensional depth structure 
of the pixels in a scene forming part of a frame sequence. 
20 This type of estimation is important for modelling of ob- 
jects moving in front of each other, as well as for model- 
ling of horizontally or vertically rotating objects. The 
depth information may also be of intrinsic interest by 
itself . 

25 Depth modelling requires the depth to be estimat- 

ed, at least approximately, for the pixels involved in an 
occlusion. It is preferable to represent this estimated 
information at the involved pixel positions in the reference 
image model . 
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Depth estimation may be performed using any of a 
number of different methods. In a preferred embodiment, 
topological sorting of pixels, based on how some pixels 
occlude other pixels in various frames is used. For pixels 
5 where potential occlusions are detected (as indicated in the 
warnings from the Local ChangeField Estimator) , different 
depth hypotheses are tried for several consecutive frames. 
For each frame, the ChangeField Estimator is repeatedly 
operated for the different depth hypotheses, and the result- 

10 ing modelling success of the input frame intensity using 
the different hypotheses is accumulated. The depth hypothe- 
sis that results in the most consistent and accurate repre- 
sentation of the intensity data over the frames tested, 
is accepted and used as the depth model information. Ini- 

15 tially, this depth information may be used to establish the 
basic depth ZCO^ef ^ot those pixels where this is required. 
Subsequently in the encoding process for the same sequence, 
the same techniques may be used to widen the depth change 
factor model with new factors Z {£) Ref# f =i # 2 , . . . for those 

20 pixels that show more complex occlusion patterns owing to 
their depth changing from one frame to another. 

In an alternative embodiment, singular value 
decomposition of the address change fields DAr^j^, may be used 
to establish 3D depth information, as outlined in Carlo 

25 Tomasi and Talceo Kanade, "SHAPE AND MOTION WITHOUT DEPTH", 
IEEE CH2934-8/90, pp. 91-95, 1990. 

Iterative control for frame n 
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A special mode of operation for the Spatial Model 
Widener 1214 is used during iterative optimization for each 
frame n. When separate (competing) estimates of local 
change fields da^, di^n,, dp„^ are used, as described above in 
5 the preferred embodiment of the Local ChangeField Estimator 
850, the Spatial Model Widener 1214 must formulate a joint 
compromise DXRef.n (joint) to be used simultaneously for all 
domains. In the preferred embodiment, information from only- 
one of the domains is accepted into the joint change field 

10 DXnef^C joint) during each iteration. 

At the beginning of the iterative estimation of 
each frame, smile changes are accepted as the most probable 
changes. However, throughout the iterative estimation, care 
must be taken that the accepted smile fields be sufficiently 

15 smooth and do not give erroneous occlusions in the subse- 
quent iteration (s) . In general, change field information 
that fits the already established factor loadings in Xj^^f (as 
determined in the Score Estimator 1202) are accepted in 
favor of unmodelled residuals EX^ef^ (as determined in the 

20 Residual ChangeField Estimator 1210) , which are only accept- 
ed as change field information towards the end of the itera- 
tive process for each frame. Thus, the change fields are 
modified according to the particular stage of encoding and 
the quality of the change fields of this iteration compared 

25 to those of previous iterations. In each iteration, the 
resulting accepted change field information is accumulated 
as the joint change field DX^cf^ (joint) . 

During each iteration, the Interpreter 720 must 
convey this joint change field, DXR^f^( joint) back to the 
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ChangeField Estimator 710 for further refinement in the next 
iteration. This is accomplished by including the joint 
change field DXR^f^ ( j oint ) as one extra factor in X^^f (with 
score allways equal to 1) . Thus, this extra factor accumu- 
5 lates incremental changes to the change field for frame n 
from each new iteration. At the end of the iterative pro- 
cess, this extra factor represents the accumulated joint 
change field, which can then be used for score and residual 
estimation, widening, deepening, updating and extending, as 
10 described above. 

Model Updaters 
The two updating modules, the Temporal Model 
Updater 1206 and Spatial Model Updater 1208, serve to opti- 

15 mize the temporal and spatial model with respect to various 
criteria, depending on the application. In the case of 
real-time video coding, such as in video conference applica- 
tions, the Temporal Model Updater 1206 computes the 
eigenvalue structure of the covariance matrix between the 

20 different factors' scores within each domain, as time pass- 
es. Variation phenomena no longer active (e.g., a person 
who has left the video conference room) are identified as 
dimensions corresponding to low eigenvalues in the inter - 
score covariance matrices, and are thus eliminated from the 

25 score model in the Temporal Model Updater 1206. The corre- 
sponding loading dimension is eliminated from the loadings 
in the Spatial Model Updater 1208, The resulting 
eigenvalue- eigenvector structure of the inter- score 
covariance matrix may also be used to optimize the quant- 
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ization and transmission control for the temporal parameters 
of the other, still active factors. 

During encoding of video data (both real-time and 
off-line) , unreliable factor dimensions are likewise elimi- 
5 nated as the encoding proceeds repeatedly though the se- 
quence, by factor rotation of the loadings and scores in the 
two Model Updaters 1206 and 1208 based on singular value 
decomposition of the inter- score covariance matrix or the 
inter- loading covariance matrix, and eliminating dimensions 

10 corresponding to low eigenvalues. 

The eigen- analysis of the factor scores in the 
Temporal Model Updater 1206 ctnd of the factor loadings in 
the Spatial Model Updater 1208 correspond to a type of meta- 
modelling, as will be discussed in more detail below. The 

15 Spatial Model Updater 1208 may check for spatial pixel clus- 
ter patterns in the loading spaces indicating a need for 
changes in the hoi on segmentation in the Spatial Model 
Extender 1216. 

The Model Updaters 1206 and 1208 may also perform 

2 0 conventional factor analysis rotation, such as varimax rota- 
tion, to obtain a "simple structure" for the factor scores 
in the case of Temporal Model Updater 1206 or loadings (in 
the case of Spatial Model Updater 1208), for improved com- 
pression, editing and memory usage. . Factor analytic "simple 

25 structures" can be understood by way of the following exam- 
ple. First, assume that two types of changes patterns, 
e.g., blush patterns "A" (blushing cheeks) and "B" (room 
lighting) have been modelled using two blush factors, but 
the blush factor have coincidentally combined the patterns 
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in such a way that factor 1 models "A" and "B" and factor 2 
models "A" and "-B." Factor rotation to a simple structure, 
in this case, means computing a new set of loadings by 
multiplying the two loadings with a 2x2 rotation matrix g so 
5 that after the matrix multiplication, only pattern "A" is 
represented in one factor and only pattern "B" is represent- 
ed in the other factor. Corresponding new scores are ob- 
tained by multiplying the original scores with the inverse 
of matrix g. Alternatively, the original scores may be 

10 used. However, the new loadings must then be multiplied by 
the inverse of g. 

Yet another function of the Temporal Model Updater 
1206 is to accumulate multidimensional histograms of "co- 
occurrence" of various model parameters, e.g., smile and 

15 blush factors. This histogram gives an accumulated count of 
how often various combinations of score values of the vari- 
ous domains occur simultaneously. If particular patterns of 
CO -occurence appear, this may indicate the need for deepen- 
ing the model, e.g., by converting blush factor information 

20 into smile factor information. 

Spatial Model Extender 
The Spatial Model Extender 1216 organizes and 
reorganizes data into segments or holons. In the case of 
video coding, the segments are primarily spatial holons, and 

25 thus, the extender is referred to as a "Spatial" Model 

Extender. The Spatial Model Extender 1216 receives as input 
a set of holons, each represented by pixel loadings X^^^f, 
sequence frame scores Ugcq/ change fields DX^^f^, and 
unmodelled change field residuals EXR^f^^. The Spatial Model 
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Extender 1216 also receives as input, the abnormality warn- 
ings from the ChangeField Estimator 710 the actual input 
frame x^, in addition to various input control parameters. 
The Spatial Model Extender 1216 processes these inputs and 
5 outputs an updated set of holons, each with pixel loadings 
X^^, sequence frame scores Uscq, unmodelled residuals BX^,,, 
and various output control parameters. 



the Multipass Controller 62 0 whenever the accumulated signal 

10 from the warnings output from from the ChangeField Estima- 
tor indicate a significant amount of unmodelled spatial 
information in a new frame x^. The segmentation of as yet 
unmodelled regions into new holons may be performed using 
the estimated address change fields Di^^^f^, e.g. as described 

15 in John Y.A. Wang and Edward H. Adelson, "LAYERED REPRESEN- 
TATION FOR IMAGE SEQUENCE CODING", IEEE ICASSP, Vol.5, pp. 
221-224, Minneapolis, Minnesota, 1993, which is incorporated 
herein by reference. This is particularly important in the 
areas where the incoming warnings indicate the need for 

20 segmentation. The pixels in such areas are given 

particluarly high weights in the search for segments with 
homogenous movement patterns. 

As an alternative, or even additional, method of seg- 
mentation, the segments may be determined using various 

25 factor loading structures in Xr^, such as clusters of pixels 
in the factor loading vector spaces (f=:l,2,...) as deter- 
mined using standard cluster analysis in the factor loading 
spaces. Clusters with simple internal structures indicate 
pixels that change in related ways, and are thus, possible 



The Spatial Model Extender 1216 is activated by 
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candidates for segments. In addition, those pixels that are 
adjacent to each other in *the address space i^icf(O) , are 
identified as stronger candidates for segmentation. In this 
manner, new segments are formed. On the other hand, exist - 
5 ing segments are expanded or merged if the new segments lie 
adjacent to the existing ones and appear to have similar 
temporal movement behavious. Existing segments that show 
hetorgenous movements along the edges may be contracted to a 
smaller spatial region, and segments that show heterogenous 
10 movements in their spatial interiors may be split into 
independent holons. 

One of the probabilistic properties of PRef is used 
to indicate a particularly high probability of segment shape 
changes or extensions along existing segment edges, i.e., 
15 there is a probability that seemingly new segments are in 
fact just extensions of existing segments, extended at the 
segment edges. Similarly, this probabilistic property may 
be used to classify into segments those new objects appear- 
ing at the image edge. In addition, this property may also 
20 be used to introduce semi- transparency at holon edges. 

The Spatial Model Extender 1216, as operated by 
the Multipass Controller 620, produces both temporary holons 
or segments which are used in the initial stabilization or 
tentative modelling in the encoding process; these holons 
25 may be merged or deleted during the iterative encoding 

process, resulting in the final holons used to model each 
individual sequence at the end of the encoding process. As 
illustrated in Figure 3, since with the introduction of new 
holons, the Extended Reference Image becomes larger than the 



wo 95/08240 




PCTAJS94/10190 



86 

individual input frames, the holons must be spatially stored 
in the Extended Reference Image Model X^^f, so as not to 
overlap with each other. Alternatively, storage methods 
such as the multilayer structure described in John Y.A. Wang 
5 and Edward H. Adelson, "LAYERED REPRESENTATION FOR IMAGE 

SEQUENCE CODING", IEEE ICASSP, Vol.5, pp. 221-224, Minneapo- 
lis, Minnesota, 1993, which is incorporated herein by refer- 
ence, may be used. 

Model Deepener 

10 The Model Deepener 1218 of the Interpreter 720 

provides various functions that improve the modelling effi- 
ciency. One of these functions is to estimate transparency 
change fields as a sub- operand of the probabilistic domain 
I^PRef^- This may be performed using the technique described 

15 in Masahiko Shizawa and Kenji Mase, "A UNIFIED COMPUTATIONAL 
THEORY FOR MOTION TRANSPARANCY AND MOTION BOUNDARIES BASED 
ON EIGENENRGY ANALYSIS", IEEE CH2983-5/91, pp, 289-295, 
1991, which is incorporated herein by reference. 

Further, the Model Deepener 1218 is used to convert 

20 blush factors into smile factors whenever the amount and 
type of blush modelling of a holon indicates that it is 
inefficient to use blush modelling to model movements. This 
may be accomplished, for example, by reconstructing (decod- 
ing) the particular holon and then analyzing (encoding) it 

25 using an increased bias towards selection of a smile factor, 
rather than a blush factor. Similarly, smile factors may be 
converted to nod factors, whenever the smile factor loadings 
indicate holons having spatial patterns consistent with 
affine transformations of solid objects, i,e., translations. 
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by determining the address change fields DA^ef^n for the holons 
and then modelling them in terms of pseudo smile loadings 
corresponding to the various affine transformations. 
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The present invention includes a decoder that 
reconstructs images from the spatial model parameter load- 
ings Xrct and temporal model parameters scores U. In applica- * 
5 tions such as video compression, storage and transmission, 
the primary f miction of the decoder is to reproduce a cer- 
tain input sequence of frames [Xa,n=l,2, . . • .] = ocscq using the 
scores [u„, n=l, 2 , , . • • ] = Us^q which were estimated during the 

encoding of the sequence [x„,n=l,2, ]= Xscq- In other 

10 applications such as video games and virtual reality, the 
scores at different points in time [Ua#n=ni,n2, . . . ] =U may be 
generated in real time, for example, by a user activated 
joystick* 

In the present description, the predicted results 
15 for each frame n are denoted as the forecasted frame m. 
Thus, is equivalent to x^hat . 

A preferred embodiment of the Decoder 1300 is 
illustrated in block diagram form in Figure 13 . This Decod- 
er 1300 is substantially equivalent to the Internal Decoder 
20 83 0 of the Change Estimator 710 (Figure 8) of the Encoder. 
However, the Decoder 1300 of Figure 13 includes some addi- 
tional functional elements. These additional elements are 
discussed in detail in the attached appendix, DECODER -APPEN- 
DIX. 
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The resulting change fields BX^^f,^ 1358 are then 
passed to the Adder 1330 where they are added to the basic 
reference image X(0)Ref 1360, to produce X^^^^Rcf 1362, i.e*, the 
forecasted values for frame m given in the reference posi- 
5 tion. This contains the changed values which the various 

holons in the reference image will assume upon output in the 
forecasted frame; however, this information is still given 
in the reference position. 

These changed values given in the reference posi- 

10 tion, X^^^Rcf 1362, are then "moved" in the Mover 1340 from the 
reference position to the m position using the movement 
parameters provided by the address change field DA^^f,^ 13 64. 
In the case of an internal decoder 830 of an encoder 600, 
the Mover 1340 may provide the return field da^^^^f 1366, which 

15 may used to move values back from the m position to the 
reference position. 

The primary output of the Mover 1340 is the forecasted 
result 7c^, to which error corrections ex^ 13 68 may optionally 
be added. The resulting signal may then be filtered inside 

20 the post processor 1350, for example, to enhance edge ef- 
fects, in order to yield the final result 1370. The 
Adder 1330, Mover 1340 and post processor 1350 may employ 
standard decoding techniques, such as are outlined in George 
Wolberg, Digital Image Warping . Chapter 7, (IEEE Computer 

25 Society Press 1990) , which is incorporated herein by refer- 
ence . 

The Decoder 1300 may also include additional func- 
tionality for controlling and handing the external communi- 
cation, decryption, local storage and retrieval of model 
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parameters which are repeatedly used, for coinmunication to 
the output medium (such as a computer video display terminal 
or TV screen) and other functions that are readily under- 
stood by those skilled in the art. 
5 It should be noted that the Mover operators 1040 

(1340) and 1010 (870) may use different methods for combin- 
ing two or more pieces of information which are placed at 
the same coordinate position. In the preferred implementa- 
tion for video encoding and decoding, different information 
10 is combined using 3D occlusion, modified according to the 
transparancy of the various overlaid media. For other 
applications, such as analysis of images of two-way electro- 
phoresis gels for protein analysis, the contributions of 
different holons may simply be added. 

15 

ENCODER OPERATION - MULTIPASS CONTROLLER 

Encoder System Control and Operation 
The operation of the encoder/decoder system described 

20 in detail above, will now be explained for an off-line video 
encoding application. First, the simplified encoder (alter- 
native embodiment) and the full encoder (preferred embodiment) 
will be compared. Then, the simplified encoder will first be 
described, followed by a description of the full encoder. 

25 A video encoding system must be able to detect 

sequences of sufficiently related image information, in order 
that they be modelled by a sequence model. For each such 
sequence, a model must be developed in such a way as to give 
adequate reconstruction quality, efficient compression, and 
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editability. This must be accomplished within the physical 
constraints of the encoding system, the storage/transmission 
and decoding systems* 

To achieve compact, parsimoneous modelling of a se- 
5 quence, the changes in the sequence should be ascribed to 

appropriate domain parameters, viz., movements should mainly be 
modelled by smile and nod factors, intensity changes should 
mainly be modelled by blush factors and transparancy effects 
mainly modelled by probabilistic factors. Effective modelling 

10 of various change types to the proper domain parameters re- 
quires statistical stabilization of the model parameter estima- 
tion, in addition to good separation of the various model 
domains. This in turn requires modelling over many frames. 
The two encoder embodiments differ in how they accomplish this 

15 task. 

The simplified encoder employs a simple sequential 
control and operation mechanism that results in identification 
of suitable frame sequences during parameter estimation. 
However, it does not attempt to optimize the simultaneous 

20 statistical modelling in the various domains. The full encoder 
on the other hand, requires sequence identification as part of 
a separate preprocessing stage. This preprocessing stage also 
initializes various statistical weighting functions that are 
updated and used throughout the encoding process to optimize 

25 the noise and error robustness of the multi-domain modelling. 
The simplified encoder repeatedly searches through a 
video frame sequence for related unmodelled change structures 



wo 95/08240 




PCT/US94/10190 



92 

whicti may be modelled either as a new factor in the smile 
domain, the blush domain, or as a new spatial image 
segmentation. The optimal choice from among the potential 
smile, blush and segmentation changes, is included in the 
5 sequence model, either as a widening of the smile or blush 

model, or as an extension or reorganization of the holons. The 
search process is then repeated until adequate modelling is at- 
tained. 

The full encoder, in contrast, gradually widens, 
10 extends and deepens the model for a given sequence by passing 
through the sequence several times, each time attempting to 
model each frame in the three domains in such a way as to be 
maximally consistent with the corresponding modelling of the 
other frames . 

15 In the simplified encoder, the estimation of 

unmodelled change fields for each frame is relatively simple, 
since each domain is modelled separately. Smile change fields 
D^cf,n/ii=nl,n2, , , . are extracted and modelled in one pass, which 
may be shorter than the entire sequence of frames, and intensi- 

20 ty change fields DIr^^, n==nl,n2,... are extracted and modelled 
in a second pass, which may also be shorter than the entire 
sequence of frames. Each pass is continued until the incremen- 
tal modelling information obtained is outweighed by the model- 
ling complexity. In the full encoder, the corresponding 

25 estimation of unmodelled change fields for each frame is more 
complicated, since the change fields for each frame are mod- 
elled jointly and therefore must be mutually compatible. This 
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compatability is obtained by an iterative development of change 
fields in the different domains for each frame. 



5 Estimator 1202 of the Interpreter 720 to estimate factor scores 
Un for the already established factors in X^^f. The model may be 
temporarily widened with tentatively established new factors in 
the domain being modelled. Subsequently, the ChangeField 
Estimator 710 is used to generate either an estimate of 

10 unmodelled smile change fields I>A^ef,n or unmodelled blush change 
fields DlRcf^. In each case, the tentative new factors are 
developed in the Spatial Model Widener 1214. The Interpreter 
720 also checks for possible segmentation improvements in the 
Spatial Model Extender 1216. The Multipass Controller 620 in 

15 conjunction with the Spatial Model Widener 1214, widens either 
the blush or the smile model with a new factor, or alternative- 
ly imposes spatial extension/ reorganization in the Spatial 
Model Extender 1216. The Multipass Controller 620 also 
initiates the beginning of a new sequence model whenever the 

20 change fields exhibit dramatic change. The process may then be 
repeated until satisfactory modelling is obtained. 



Simplified Encoder systems Control and Operation 



For each frame, the simplified encoder uses the Score 



Full Encoder Systems Control and Operation 



Preprocessing 



25 



The input data are first converted from the input 



color space, which may for example be RGB, to a different 



format, such as YUV, in order to ensure better separation of 
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liiminosity and chrominance. This conversion may be carried out 
using known, standard techniques. In order to avoid confusion 
between the V color coir^onent in YUV and the V (vertical) 
coordinate in HVZ address space, this description is given in 
5 terms of RGB color space. The intensity of each converted 
frame n is referred to as in. Also, the input spatial coordi- 
nate system may be changed at various stages of the encoding 
and decoding processes. In particular, the spatial resolution 
may during preprocessing be changed by successively reducing 

10 the input format (vertical and horizontal pels, adresses a^) by 
a factor of 2 in both horizontal and vertical direction using 
standard techniques. This results in a so-called "Gaussian 
pyramid" representation of the same input images, but at 
different spatial resolutions. The smaller, low- resolution 

15 images may be used for preliminary parameter estimation, and 
the spatial resolution increased as the model becomes increas- 
ingly reliable and stable. 

Continuing, preliminary modelabilities of the input 
data are first estimated. For each of the successive spatial 

20 resolutions, the intensity data for each frame are analyzed 
in order to assess the probabilities of whether the intensity 
data for the individual pixels are going to be easy to model 
mathematically. This analysis involves determining different 
probabilities which are referred to as p^, and discussed in. 

25 detail below. 

The preliminary modelability includes a determination 
of the two-dimensional recognizability of the input data, i.e.. 
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an estimate of how "edgy" the different regions of the image 
are. "Edgy" regions are easier to detect and follow with 
respect to motion, than continuous regions. Specifically, an 
estimate of the degree of spatially recognizable structures 
5 p(l)n is computed such that pixels representing clear 2D spa- 
tial contours and pixels at spatial corner structures are 
assigned values close to 1, while pixels in continuous areas 
are assigned values close to zero. Other pixels are assigned 
intermediate values between zero and one. This may be carried 

10 out using the specific procedure set forth in Carlo Tomasi and 
Takeo Kanade, "SHAPE AND MOTION WITHOUT DEPTH", IEEE CH2934- 
8/90 pp. 91-95, 1990, which is incorporated herein by refer- 
ence, or in Rolf Volden and Jens G, Balchen, "DETERMINING 3-D 
OBJECT COORDINATES FROM A SEQUENCE OF 2-D IMAGES", Proc • of the 

15 Eighth Internatl Symposium on Unmanned Untethered Submersible 
Technology, Sept. 1993, pp. 359-369, which is incorporated 
herein by reference. 

Similarly, the preliminary modelability includes a 
determination of the one -dimensional recognizability of the 

20 input data, i.e, an indication of the intensity variations 

along either a horizontal or vertical line through the image. 
This procedure involves formulating an estimate of the degree 
• of horizontally or vertically clear contours. Pixels which are 

part of clear horizontally or vertically contours (as detected 

25 from e.g, absolute values of the spatial derivatives in hori- 
zontal and vertical directions) are assigned a value p(2)„=l. 



wo 95/08240 




PCT/US94/10190 



96 

while those which are in continuous areas are assigned a value 
of zero, and other pixels* are assigned values in between. 

The preliminary modelability also includes determin- 
ing aperture problems, by estimating the probability of aper- 
5 ture problems for each pixel as pO)^- Smooth local movements, 
i.e., spatial structures that appear to move linearly over the 
course of several consecutive frames are assigned a maximum 
value of 1, while pixels where no such structures are found are 
assigned a value of 0. Similarly, structures which appear not 

10 to move at all over the course of several consecutive frames 
are treated in much the same manner. Collectively, this 
estimate of seemingly smooth movement or non-movement is 
referred to as p(4)^. This property may also be used to esti- 
mate smooth intensity changes (or non- changes) over the course 

15 of several consecutive frames. 

The probability of half pixels which may arise at 
boundary edges and are unreliable because they are an average 
of different intensity spatial areas, and as such, do not 
represent true intensities, is confuted and referred to as 

20 p(5)^. 

Together, the intensity, address and probabilistic 
data are symboliized by x^, and include address properties, 
intensity properties, and the different probabilistic proper- 
ties, such as p(l)n through p (5)^. 
25 The preprocessing also includes detection of sequence 

length and the determination of subsequence limits. This is 



wo 95/08240 




PCT/US94/10190 



97 

accomplished by analyzing the change property p(4)„ and the 
intensities i„ over the entire sequence and performing a 
multivariate analysis of the low- resolution intensities in 
order to extract a low number of principal components. This is 
5 followed by a cluster analysis of the factor scores, in order 
to group highly related frames into sequences to be modelled 
together. If a scene is too long or too heterogenous, then it 
may be temporally split into shorter subsequences for simpli- 
fied analysis using local models. Later in the encoding 
10 process, such subsequence models may be merged together into a 
full sequence model. In the initial splitting of sequences, it 
is important that the subsequences overlap by a few frames in 
either direction. 

The thermal noise level in the subsequence is esti- 
15 mated by accumulating the overall random noise variance associ- 
ated with each of the intensity channels and storing this value 
as the initial uncertainty variance s^i^ along with the actual 
values in in. 

The preprocessing also produces an initial reference 
20 image Xr^ for each subsequence. Initially, one frame n^f in 
each subsequence is chosen as the starting point for the 
reference image. This frame is chosen on the basis of princi- 
pal component analysis of the low resolution intensities, 
followed by a search in the factor score space for the most 
25 typical frame in the subsequence. Frames within the middle 
portion of the subsequence are preferred over frames at the 
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Start or end of the sxibsequence, since middle frames have 
neighboring frames in both directions of the subsequence. 
Initialization 

Initialization includes setting the initial values of 
the various control parameters. First, the ScoreRidge is set 
to a high initial value for all domains and all sub- operands. 
This parameter is used in the ScoreEstimator 1202 to stabilize 
the scores of small factors. (When singular value decomposition 
(principal component analysis etc) is used for extracting the 
factors, the size of individual factors is defined by their 
associated eigenvalue size,- small factors have small eigenval- 
ues. In the more general case, small factors are here defined 
as factors whose scores x loading product matrix has a low sum 
of squared pixel values. The size of a factor is determined by 
how many pixels are involved and how strongly they are affect- 
ed by the loadings of that factor, and by how many frames are 
affected and how strongly they are affected by the factor 
scores) . 

SqueezeBlush is set to a high initial value for each frame 
in order to make sure that the estimation of smile fields is 
not mistakenly thwarted by preliminary blush fields that 
erroneously pick up movement effects. Similarly SqueezeSmile 
is set to a high initial value for each frame in order to make 
sure that the proper estimation of the blush fields is not 
adversely affected by spurious inconsistencies in the prelimi- 
nary smile fields. The use of SqueezeBlush and SqueezeSmile is 
an iterative process designed to achieve the proper balance 
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between smile and blush change fields that optimally model the 
image changes. The initialization also includes initially 
establishing the full reference image XR^f as one single holon, 
and assuming very smooth movement fields. 
5 The spatial model parameters X^ef and temporal model 

parameters JJ^ are estimated by iteratively performing several 
passes through the subsequence. For each pass, starting at the 
initial reference frame, the frames are searched 
bidirectional ly through the subsequence on either side of the 
10 frame n^^f until a sufficiently satisfactory model is obtained. 

For each frame, the statistical weights for each 
pixel, for each iteration and for each frame are determined. 
These statistical or reliability weights are an indication of 
the present modelability of the pixels in a given frame. These 
15 reliability weights wgts^x^^ for each pixel for frame n, x^^ for 
the various sub -operands are: 

a^: wgts_a^ = function of {Pn#s^ao,Wo) 
wgts__in = function of (Pn/S^in/W^) 

20 The reliability weights are proportional to the proba±>ilistic 
properties p„, and inversely proportional to both the variances 
s^a„ and the warnings w^. Similarly, the reliability weights 
Wgts_XRcf for each pixel in the preliminary model (s) Xr^, for 
each sub- operand, each factor and each holon are: 

25 A^cfJ Wgts_ARef: inversely proportional function of 

(S^ARef ) for each factor in each sub -operand. 
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iR^f: Wgts_lRcf: inversely proportional function of 
(S^lRef ) for each factor in each sub- operand. 

In general, only those factors which are found to be 
applicable to a sufficient number of frames are retained. 
5 Multi- frame applicability of the extracted factors is tested by 
cross validation or leveraged correction, as described in H. 
Martens and T. Naes, Multivariate Calibration , pp 237-265, 
(John Wiley & Sons, 1989) , which is incorporated herein by 
reference. Specifically, in the case of multi-pass or itera- 

10 tive estimation, this may include preventing the contribution 
due to the current frame n from being artificially validated as 
a multi-frame factor based on its own contribution to the model 
during an earlier pass . 

The estimation of the change field DXr^^ and its 

15 subsequent contribution to the model {X^^f, JJ^} for each frame n 
relative to the subsequence or full sequence model to which it 
belongs is an iterative process, which will now be discussed in 
detail. For the first few frames encountered in the first pass 
through the subsequence, no reliable model has as yet been 

20 developed. Thus, the estimation of the change fields for these 
first few frames is more difficult and uncertain than for 
subsuquent frames. As the model develops further, it increas- 
ingly assists in the stabilization and simplification of the 
estimation of the change fields for later frames. Therefore, 

25 during the initial pass through the first few frames, only 

those image regions that have a high degree of modelability are 
used. In addition, with respect to movement, strong assiamp- 
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tions about smooth change fields are used in order to restrict 
the possible degrees of freedom in estimating the change fields 
for the first few frames. Similarly, with respect to blush 
factors, strong assumptions about smoothness and multi- frame 
5 applicability are imposed in order to prevent unnecessary 
reliance on blush factors alone. As the encoding process 
iterates, these assumptions and requirements are relaxed so 
that true minor change patterns are properly modelled by change 
factors . 

10 The encoding process for a sequence according to the 

preferred embodiment, requires that the joint change fields 
DXr^^ be estimated for each frame, i.e., the different domain 
change fields Di^^^f^, Dln^f^ and DPRef^ may be used simultaneously 
to give acceptcible decoded results x^. As explained above, 

15 this requires an iterative modification of the different 

domains change fields for each frame. The weights, wgts^x^ and 
Wgts_Xjief, defined for address and intensity, are used for 
optimization of the estimation of the local change field dx^. 
During this iterative process, the Interpreter 720 is used 

20 primarily for accumulating change field information in 

DXftcf^C joint) , as described above. The values in the already 
established sequence model X^^^, Uscq are not modified. 

In the iterative incremental estimation of the change 
field information DXR^f^ (joint) , the model estimation keeps 

25 track of the results from the individual iterations, and back- 
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tracks out of sets of iterations in which the chosen increments 
fail to produce satisfactory modelling stability. 

Once the joint change field DX^cf^n ( joi^i*^) has been 
estimated for a given frame, this is analyzed in the Interpret - 
5 er 720 in order to optimize the sequence model Xr^, Uscq based on 
DXR^f^ (joint) . 

Developing the sequence model 
The reliability weights for frame n and for the model 
are updated. Subsequently, scores u„ and residuals EXr^^ are 

10 estimated, and the change field information is accumulated for 
the possible widening of the reference model with new valid 
change factors. The reference model is extended using segmen- 
tation, improvement of 3D structures are attempted, and oppor- 
tunities for model deepening are checked. All of these opera- 

15 tions will be described in detail below. 

When all the frames in a subsequence have been thus 
analysed so that a pass is completed, the weights and 
probabilistic properties are further updated to enhance the 
estimation during the next pass, with the obtained model being 

20 optionally rotated statistically to attain a simpler factor 
structure. In addition, the possibility of merging a given 
siibsequence with other subsequences is investigated, and the 
need for further passes is checked. If no further passes are 
necessary, the parameter results obtained thus far may be run 

25 through the system one final time, with the parameters being 
quantized. 
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The control and operation of the full encoding 
process will now be described in more detail. First, the 
weights are modified according to the obtained uncertainty 
variances of the various sub -operands in DXr^j^. Pixels with 
5 high uncertainty in a given sub- operand change field are given 
lower weight for the subsequent statistical operations for this 
sub -operand. These weights are then used to optimize the 
multivariate statistical processes in the Interpreter 720. 

The scores u^ for the various domains and sub-oper- 

10 ands are estimated for the different holons in the Score 

Estimator 1202. Also, the associated uncertainty covariances 
are estimated using conventional linear least squares method- 
ology assximing, e.g., normally distributed noise in the residu- 
als, and providing corrections for the intercorrelations 

15 between the various factor weighted loadings. The scores with 
small total signal effects are biased towards zero, using the 
ScoreRidge parameter, for statistical stabilization. 

The residual change field EX^ is estimated, after 
subtraction of the effects of the known factors, in Residual 

20 ChangeField Estimator 1210. 

Next, the widening of the existing models Xr^ for 
various domains, sub-operands and holons, is attempted in the 
Spatial Model Widener 1214. This is performed using the esti- 
mated uncertainty variances and weights as part of the input, 

25 to make sure that data elements with high certainty dominate. 
The uncertainty variances of the loadings are estimated using 
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Standard linear least squares methodology assximing, e.g., 
normally distributed noise . 

As part of the Widening process, the basic 3D struc- 
ture Z(0) and associated change factors Z (f ) , f =1, 2, . • . are 
5 estimated according to the available data at that stage. In 
particular, warnings for unmodelled pixels in suggest tenta- 
tive 3D modelling. 

Modification of the segmentation is accomplished by 
checking the various domain data, in particular the 

10 "unmodel lability" warnings w„ and associated data in i^, against 
similar unmodelled data for adjacent frames, in order to detect 
the acciamulated development of unmodelled related areas. The 
unmodelled parts of the image are analyzed in the Spatial Model 
Extender 1216, thereby generating new holons or modifications 

15 of existing holons in 3^^^ During the course of segmentation, 
higher probability of segmentation changes is expected along 
the edges of existing holons and along the edges of and X^ef 
than elsewhere. Holons that are spatially adjacent in the 
reference image and temporally correlated are merged. In 

20 contrast, holons that display inconsistent spatial and temporal 
model structure are split. 

Shadows and transparent objects are modelled as part 
of the Widening process. This includes estimating the basic 
probabilistic transparancy of the holons. In a preferred 

25 embodiment for the identification of moving shadows, groups of 
adjacent pixels which in frame n display a systematic, low- 
dimensional loss of light in the color space as compared to a 
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different frame are designated as shadow holons. The shadow 
holons are defined as having dark color intensity and being 
semi - transparent . 

Areas in the reference image with no clear factor 
5 structure, i.e,, many low- energy factors instead of a few high- 
energy factors in A or I domains, are analyzed for spatiotempo- 
ral structures. These areas are marked for modelling with 
special modelling techniques, such as modelling of quasi -random 
systems such as running water. This part of the encoder may 

10 require some human intervention in terms of the selection of 
the particular special technique. The effect of such special 
areas are minimized in subsequent parameter estimations. 

The encoding operations described may be used with 
more complex local change field estimates dx^^. In the pre- 

15 f erred embodiment, for each pixel in each sub- operand of the 
forecasted frame m, only one change value (with its associated 
uncertainty) is estimated and output by the Local ChangeField 
Estimator 1050. In an alternative embodiment, there may be 
multiple alternative change values (each with its associated 

20 uncertainy) estimated by the Local ChangeField Estimator 1050 
for each domain or sub -operand. For example, two or more 
alternative potentially acceptable horizontal, vertical and 
depth movements of groups of pixels may be presented as part of 
da^nn in <3x^ 855 by the Local ChangeField Estimator 850. Each 

25 of these alternatives are then moved back to the reference 

position as part of DXR^f^D 890. Subsequently, the Interpreter 
attempts to model the different combinations of alternatives. 
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and chooses the one that produces the best result. A similarly 
flexible alternative approach to local modelling is to let the 
Local ChangeField Estimator 850 output only one value for each 
pixel for each suboperand, as in the preferred embodiment, but 
5 instead to replace the uncertainty (e.g., uncertainty variance 
s^dXnm) by local statistical covariance models that describe the 
most probable combination of change alternatives. These 
covariance models may then be accumulated and used by the 
Interpreter to find the most acceptable combination of model 
10 widening, extension and deepening.. 

II. Update models 

After all the frames of the present subsequence have 
been analyzed during a particular pass and the system has 
15 arrived at a stable model of a sequence, the model is updated 
in the Teitqporal and Spatial Model Updaters 1206 and 1208, 
respectively, in the Interpreter 720, thus allowing even more 
compact and easily compressible/editable factor structures. 



20 III. Merging subsequences 

In the Multipass Controller 620, an attempt is made 
to merge the present subsequence with another subsequence, 
according to meta-modelling, or the technique given in appendix 
MERGE_SUBSEQUENCES . This converts the local siibsequence models 

25 into a model which is representative for more frames of the se- 
quence, than the individual sub- sequences. 
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IV Convergence control 

At the end of each pass, the Multipass Controller 650 
checks for convergence. If convergence has not been reached, 
more passes are required. Accordingly, the Multipass Control- 
5 ler 650 modifies the control parameters and initiates the next 
pass. The Multipass Controller also keeps track of the nature 
and consequences of the various model developments in the 
various passes, and may back- track if certain model development 
choices appear to provide unsatisfactory results . 

10 

V Final model optimization 

Depending on the particular application, quantization 
errors due to parameter compression are introduced into the 
estimation of model parameters. The modelling of the sequence 

15 is again repeated once more in order to allow subsequent 
parameters the opport\mity to correct for the quantization 
errors introduced by prior parameters. Finally, the parameters 
in Xjicf and Uscq and error correction residuals EX^^^f are com- 
pressed and ready for storage and/or transmission to be used by 

20 a decoder. 

The internal model data may be stored using more 
precision than the input data. For example in video coding, by 
modelling accumulated information from several input frames of 
related, but moving objects, the final internal model X^^f may 
25 have higher spatial resolution than the individual input 

frames. On the other hand, the internal model may be stored 
using completely different resolution than the' input or output 
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data, e.g. , as a compact subset of irregularly spaced key 
picture elements chosen by the Model Deepener from among the 
full set of available pixels, so that good output image quality 
may be obtained by interpolating between the pixels in the 
5 Mover portion of the Decoder, The present invention may also 
output decoded results in a different representation than that 
of the input. For example, using inteirpolation and extrapola- 
tion of the temporal and spatial parameters, along with a 
change of the color space, the system may convert between NTSC 

10 and PAL video formats. 

The IDLE modelling of the present invention may be 
used to sort the order of input or output data elements. This 
type of sorting may be applied so that the rows of individual 
input or output frames are changed relative to their common 

15 order, as part of a video encryption scheme. 

Deleterious effects due to missing or particularly 
noisy data elements in the input data may be handled by the 
present system since the modelling contribution of each indi- 
vidual input data element may be weighted relative to that of 

20 the other data elements, with the individual weights being 
estimated by the encoder system itself. 

The preferred embodiment of the present invention 
uses various two-way bi- linear factor models, each consisting 
of a sxim (hence the term "linear") of factor contributions, 

25 each factor being defined as the product of two types of 

parameters, a score and a loading (hence the therm "bi -lin- 
ear" ) . These parameters describe, e.g., temporal and spatial 
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change information, respectively. This type of modelling may 
be generalized or extended. One such generalization is the use 
of higher- way models, such as a tri- linear model where each 
factor contribution is the product of three types of parame- 
5 ters, instead of just two. Alternatively, each of the bi- 
linear factors may be further modelled by its own bi- linear 
model • 

META MODELLING 

Single -sequence meta-modelling 

10 The IDLE model parameters obtained according to the 

system and method described above already have redundancies 
within the individual suboperands removed. However, the model 
parameters may still have remaining redundancies across domains 
and suboperands. For instance, the spatial pattern of how of 

15 an object changes color intensity may resemble the spatial 

pattern of how that object also moves. Thus, there is spatial 
correlation between some color and movement loadings in Xr^t- 
Similarly, the temporal patterns of how one object changes 
color over time may resemble how that object or some other 

20 object moves over time. In this latter case, there is teitporal 
correlation between some color and movement scores in Ugeq. 
Me ta -modelling is essentially the same as IDLE modelling, 
except that the input is the set of model parameters rather 
than a set of input frames. 

25 Spatial meta-modellincr 

Spatial meta- model ling is essentially the same as 
IDLE modelling; however, the inputs to the model are now the 
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individual loads determined as part of a first IDLE model. For 
each holon of the initial model Xj^f, we may collect all the 
factor loadings of all colors, e,g., in the case of RGB repre- 
sentations: red loadings R(f ) r^, f =0, 1, 2, , green loadings 

5 loadings G{f )Ref. f=0, 1,2, . . . , and blue loadings B (f ) R^f. f =0 / 1 / 2 , . - 
totalling F factors, into an equivalent single meta-se- 
quence consisting of F intensity "frames," each frame being an 
intensity loading having the same size as the holon in the 
extended reference frame. When each of the loadings is strung 

10 out as a line, as in the Spatial Widener in the Interpreter, 
the color intensity loadings form an FxM matrix, with a total 
of F intensity loadings each having M pixels. A singular value 
decomposition (svd) of this matrix generates meta- factors with 
meta- loadings for each of the M pixels and meta- scores for each 

15 of the F original factors. The svd yields a perfect recon- 
struction of the original loadings if the number of meta- 
f actors equals the smaller of M or F. However, if there are 
significant inter- color spatial correlations in the original 
loadings, these will be accumulated in the meta- factors , 

20 resulting in fewer than the smaller of M or F factors necessary 
for proper reconstruction. The meta- scores indicate how the F 
original color factor loadings are related to each other, and 
the meta- loadings indicate how these interrelations are spa- 
tially distributed over the M pixels.. 

25 Similarly, if there are spatial intercorrelations 

between how one holon moves in the three coordinate directions, 
spatial meta ^modelling of the smile loadings in both horizon- 
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tal, vertical and depth direction will reveal these intercorre- 
lations . Likewise, if there are spatial intercorrelations 
between how one holon changes with respect to two or more 
probabilistic properties, these probabilistic redundancies can 
5 be consolidated using spatial meta- model ling of the loadings of 
the various probabilistic properties. 

Finally, the spatial meta -model ling may instead be 
performed on both the color intensity, movement and probabilis- 
tic change loadings simultaneously for each holon or for groups 

10 of holons. Again, the spatial meta-loadings represent the 
spatial correlation redundancies within the original IDLE 
model, and the spatial meta- scores quantify how the original 
IDLE factor loadings are related to each other with respect to 
spatial correlation ♦ As in standard principal component 

15 analysis, if the original input loading matrix is standardized, 
the distribution of eigenvalues from the svd indicates the 
degree of inter correlation found, H. Martens and T. Naes, 
Multivariate Calibration . Chapter 3 (John Wiley & Sons, 19 89) , 
which is incorporated herein by reference. 

20 Such direct svd on spatial loadings may be considered 

the equivalent of spatial blush modelling at the meta level. 
Similarly, the spatial meta modelling using only meta-blush 
factors, may be extended to full IDLE modelling, with meta- 
reference, meta-blush, meta- smile and meta-probabilistic 

25 models. One of the original loadings may be used as a meta^ 
reference. The spatial meta- smile factors then define how 
regions in the different original loadings need to be moved in 
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order to optimize their spatial redundancy. The meta-holons 
need not be the same as the original holons. Spatial meta- 
holons may be defined as either portions of the original holons 
or groups of the original holons, having regions with similar 
5 systematic spatial inter- loading correlation patterns- Other- 
probabilistic spatial meta-suboperands such as spatial meta- 
transparancy allow blending of the different spatial meta- 
holons • 

10 Temporal meta-modelling 

Temporal meta-modelling is essentially the same as 
IDLE modelling; however, the input to the model is now the 
scores determined as part of a first IDLE model- In much the 
same manner as the meta-modelling of the original spatial 

15 change factor loadings in Xr^, an IDLE meta-modelling may be 
applied to the sequence scores in Ugcq. The temporal meta- 
analysis may be performed on some or all of the suboperand 
factors for some or all of the holons over some or all of the 
sequence frames - 

20 The temporal meta- factor loadings thus indicate how 

the different frames n==l,2,*..N in the original video sequence 
relate to each other, and the temporal meta- factor scores 
f=:l,2,..-,F (for whichever suboperands and holons are being 
meta- analyzed together) indicate how the scores of the dif fer- 

25 ent factors in the original IDLE model relate to each other. 
Single svd on the NxF matrix of scores then models whatever 



wo 95/08240 




PCT/IIS94/10190 



113 

temporal redundancies existed between the factors of the origi- 
nal IDLE model . 

Such simple svd of the factor scores corresponds to 
temporal meta-blush modelling. Full temporal IDLE meta-model- 
5 ling allows a reference which is a function of time, rather 
than a function of space as is the case with standard IDLE 
modelling. In this situation, meta-holons represent event (s) 
or action (s) over time, meta-smile factors represent a time 
shift of the event (s) or action(s), and meta-blush factors 

10 represent the extent of the event (s) or action (s). The meta- 
reference may be chosen to be one of the original factor score 
series through the video sequence. 

The temporal meta-smile factors can therefore be used 
to model systematic, yet complicated, temporal deviations away 

15 from the meta- reference pattern for the other change patterns 
represented by the original IDLE model. For instance, if the 
movements of one object (e.g., a trailing car) in the original 
sequence followed in time the movements and color changes of 
another object (e.g^. , brake lights of a lead car}, but exhibit- 

20 ed varying, systematic delays (e.g., due to varying accelera- 
tion patterns) , this would give rise to temporal meta-smile 
factors. The loadings of the temporal meta-smile factors 
indicate how the different frames in the original input se- 
quence relate to each other, and the temporal meta-smile scores 

25 indicate how the different factors in the original IDLE model 
relate to each other. 
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The temporal meta-holons generally correspond to 
discrete temporal events that are best modelled separately from 
each other. Meta- transparancy factors may then be used to 
smoothly combine different temporal holons . The model parame- 
5 ters of the meta-modelling processes described above may in 
turn themselves be meta -model led. 

When meta-modelling is used in the Encoder ("meta- 
encoding"), the Decoder system may have corresponding inverse 
meta-modelling ( "meta -decoding " ) . 
10 Multi- sequence meta-m odelling 

The single- sequence meta-modelling described above 
may be further applied to multi -sequence meta-modelling. One 
primary application of multi -sequence meta-modelling is video 
coding, where it is used to relate IDLE models from different, 
15 but possibly related, video sequences. One way to merge two or 
more related IDLE models is to meta-model their loadings or 
scores directly as described above. Such direct meta-modelling 
of spatial structures is useful if the extended reference 
images are the same or very similar. However, the direct 
20 spatial meta-modelling is difficult to accomplish if the 

sequences have differently sized extended reference images. 
Furthermore, although physically achievable, the result is 
rather meaningless if the extended reference image sizes are 
the saxne, but the holons are different. 
25 The direct temporal meta-modelling is also useful if 

the secjuences are of the same length and reflect related 
events, such as the leading/trailing car example discussed 
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above, Meta- modelling is difficult to perform if the sequences 
cannot be separated into sub- sequences of the same length, and 
becomes rather meaningless if the sequences do not reflect 
related events. 
5 Indirect multi- sequence meta 'modelling 

Indirect multi -sequence meta -modelling is the use of 
two or more stages of meta -model ling* One stage for is making 
two or more model parameter sets compatible, and a second stage 
of meta -model ling of the resulting compatible sets. Indirect 

10 multi -sequence meta -model ling is more flexible than the meta- 
modelling described above, in that it allows a single model to 
model a larger class of phenomena. 

In the preliminary phase of spatial meta -model ling, 
the extended reference images and the associated factor load- 

15 ings of one or more sub -sequences are used to establish a. new 
extended reference image, e.g., by siirple IDLE modelling. An 
alternative method of linking together two spatial sub -sequence 
models in order to form a new extended reference image, is de- 
scribed in further detail in the Appendix MERGE_SUBSEQUENCES . 

20 This latter approach is applicable if the sub- sequences overlap 
each other by at least one frame. 

Preliminary temporal meta -model ling achieves temporal 
compatability of one or more temporal reference sub- sequences 
and associated factor scores, with the temporal reference sub- 

25 sequence of another sub-sequence. This may be accomplished 
using a simple IDLE model to model the temporal domain. 
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Once compatability has been achieved in the spatial 
and/or temporal domains, -the different sub- sequence models may 
then be jointly meta-modelled as if they belonged to a single 
sub - seq[uence . 

5 Combining of models usi ng meta-modelling 

The scores and loadings from different models may be 
combined with the loadings and scores from different models. 
Alternatively, the scores or loadings of one model may be 
replaced with other scores or loadings from an alternate 
10 source, e.g., a real-time joystick input, and be combined using 
meta-modelling. Lip synchronization between sound and image 
data in video dubbing is one exan^jle of combining models using 
meta-modelling. Specifically, smile scores may be estimated 
from an already established IDLE image mouth movement model. 
15 These scores may then be matched to a corresponding time series 
representing the sounds produced by the talking mouth. Lip 
synch may then be accomplished using meta-modelling of the 
image scores from the already established model and the sound 
time series loadings to provide optimal covariation of the 
20 image data with the sound time series. 

Another application of combining models using meta- 
modelling of IDLE parameters is the modelling of covariations 
between the IDLE parameters of an already established model, 
and external data. For example, if IDLE modelling has been 
25 used to model a large set of related medical images in a data- 
base, the IDLE scores for selected images may be related to the 
specific medication and medical history for each of the sub- 
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jects of the corresponding images. One method for performing 
this covariation analysis is the Partial Least Squares Regres- 
sion # 2 ("PLS2"), as described in H. Martens and T. Naes, 
Multivariate Calibration , pp. 146-163 (John Wiley & Sons, 
5 1989) , which is incorporated herein by reference. 

Joint vs separate movement modeling for the different image 
input channels. 

The typical input for a color video sequence has six 

10 input quantities: 3 implicit position dimensions (vertical, 
horizontal and depth) and 3 explicit intensities (e.g. R,G,B) , 
In the preferred embodiment of the basic IDLE system, it is 
assumed that the three intensity channels represent input from 
the same camera and hence information relating to the same 

15 objects. Thus, the same segmentation and movements (S and 
opacity, smile and nod) are assiimed for all three color or 
intensity channels. The color channels are only separated in 
the blush modelling. Further model redundancy is then elimi- 
nated by joint multivariate modelling of the various loadings 

20 as described above. 

Alternatively, the basic IDLE system may be modified 
to have stronger connectivity between input quantities, i.e., 
model blush information in the different color channels simul- 
taneously, by requiring each blush factor to have one common 

25 score for each frame, but different loadings for each color 
channel. This gives preference to intensity changes with the 
same temporal dynamics in all color channels for a holon or a 
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group of holons, and could for instance be used in order to 
stabilize the estimation of the factors, as well as for editing 
and compression. 

Instead, the basic IDLE system may be modified to 
5 have weaker connectivity between input quantities, where 

movement is modeled more or less independently for each color 
channel separately. This could be computationally advantageous 
and could give more flexibility in cases where the different 
channels in fact represent different spatial information. 
10 One example of independent movement modelling is the 

case of multi- sensor geographical input images from a set of 
surveillance satellites equipped with different sensors. Based 
on one or more repeated recordings of the same geographical 
area taken at different times from different positions, and 
15 possibly exhibiting different optical aberrations, different 
times of recording and different resolutions, the IDLE system 
could be used for effective normalization, compression and 
interpretation of the somewhat incongruent input images. The 
different sensor channels may exhibit quite different sensitiv- 
20 ities to different spatial structures and phenomena. For 

example, radar and magnetometric imaging sensors may be sensi- 
tive to land and ocean surface height changes, whereas some 
photon-based imaging sensors, e.g UV, Visible and Infrared 
cameras, may have varying sensitivities to various long-term 
25 climatic changes and vegetation changes, as well as short-term 
weather conditions. In this situation, the IDLE system may 
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require separate movement and blush modelling for the indepen- 
dently observed channels • 

Another example of this type of system is input data 
obtained from several medical imaging devices (MRI, PET, CT) 
5 repeatedly scanning a given subject, over a period of time in 
order to monitor cancer growth, blood vessel changes or other 
time varying phenomenon. Since each device requires separate 
measurements, the sxibject will be positioned slightly differ- 
ently for each different device and for each scan over the 

10 course of the repeated measurements. The movement of biologi- 
cal tissue typically does not follow affine transformations. 
Thus, IDLE smile factors may be a more flexible, yet suffi- 
ciently restrictive way of representing body movements and 
allow the required normalization. Each imaging device could 

15 then have its own subset of smile factors from its extended 
reference position to the results for each individual set of 
scans from the various imaging devices. With the resulting 
normalization, blush factors and local smile factors that give 
early warning of slowly developing tissue changes may be 

20 detected. This is particularly effective if the extended 

reference position is normalized, e.g., by meta-modelling, for 
the different imaging devices for maximum spa; tial congruence. 
In this way, the joint signal from all the channels of the 
different imaging devices may be used to stabilize the model - 

25 ling against measurement noise, e.g. by requiring that the 

blush factor scores for all channels be identical and that only 
the loadings be different. 
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gf^ngyalizations from analysis of two-dimen sional inputs 
(images) 

The IDLE modelling system described above may be used 
for input records of a different format than conventional two- 
5 dimensional video images. For instance, it may be used for 
one -dimensional data, such as a time series of lines from a 
line camera, or as individual coliimns in a still image. 

The IDLE system may in the latter case is used as 
part of a still image compression system. In this type of 

10 application, the input information to the still image encoder 
is lines or columns of pels instead of two dimensional frame 
data. Each input record may represent a vertical colximn in 
the two dimensional image. Thus, the still image IDLE loading 
parameters are column- shaped instead of two dimensional images. 

15 The time dimension of the video sequence (frames n=l,2,...) is 
replaced in this case, by the horizontal pel index (coliimn 
number) in the image. 



Simultaneous modeling for different input dime nsions 
20 If the input to the still -image IDLE codec is an RGB 

still image, then the three color channels (or a transform of 
them like YUV) may be coded separately or jointly, as discussed 
above for the video IDLE codec. Likewise, if the input to the 
still -image IDLE codec is a set of spatial parameters of the 
25 extended image model from a video IDLE codec, the different 
input dimensions (blush factors, smile factors, probabilistic 
factors) may be coded separately or jointly. 
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The present invention which has been described above 
in the context of a video compression application, may be 
applied to any of a number of information processing and/or 
acquisition applications. For example, in the case of the 
5 processing of image sequences or video sequences for modelling 
or editing a video sequence (a set of related images) in 
black/white or color, the modelling is carried out with respect 
to IDLE parameters in such a way as to optimize the editing 
usefulness of the model parameters . The model parameters are 

10 possibly in turn related to established parameter sets, and 
other known editing model elements a:re forced into the model. 
Groups of parameters are related to each other in hierarchical 
fashion. The sequence is edited by changing ten^joral and/or 
spatial parameters . Sets of related video sequences are 

15 modelled jointly by multi- sequence metamodelling, i.e., each 
related sequence is mapped onto a 'Reference sequence' by a 
special IDLE meta-model. 

The present invention may also be applied to coir^res- 
sion for storage or transmission. In this application, a video 

20 sequence is modelled by IDLE encoding, and the resulting model 
parameters are compressed. Different compression and represen- 
tation strategies may be used depending on the bandwidth and 
storage capacity of the decoding system. Temporal sorting of 
the change factors, and pyramidal representation and transmis- 

25 sion of the spatial parameters may be used to increase' the 
system's robustness in the face of transmission bandwidth 
limitations . 
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Similarly/ the present invention may be applied to 
the colorization of black/white movies. In this case, the 
black/white movie sequences are modelled by IDLE encoding. The 
spatial holons in iR^f are colored manually or automatically, 
5 and these colors are automatically distributed throughout the 
sequence- Sets of related sequences may be identified for 
consistent coloring . 

In addition, the present invention may be used in 
simulators, virtual reality, games and other related applica- 
10 tions- The relevant image seqoiences are recorded and com- 
pressed. When decoding, a few chosen scores may be controlled 
by the user, instead of using the recorded scores. Similarly, 
other scores may be varied according to the user- controlled 
scores. For example, in the case of a traffic simulator: 
15 record sequences of the interior of a car and of the road and 
the terrain; identify those scores, probably nod scores, that 
correspond directly to how the car moves; determine those 
scores that change indirectly based on those nod scores, such 
as smile/blush factors for illumination, shadows, perspective 
20 etc.; and set up a mathematical model that defines how the car 
reacts to certain movements of the control inputs, such as the 
steering wheel, accelerator pedal, brake pedal etc. The user 
can then sit in a simulated car interior, with a display in 
front and perhaps also on the sides. The simulated controllers 
25 are then connected to the "direct" factors, which in turn may 
be used to control the "indirect" factors. The resulting 
images will give a very naturalistic effect. 
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The present invention also has application in realti- 
me systems such as video telephone, television, and HDTV. 
Extreme compression ratios for very long sequences may be 
attained, although there may be bursts of spatial information 
5 at the onset of new sequences. This application also includes 
real-time encoding & decoding. Depending on the computational 
power available, different degrees of IDLE algorithm complexity 
may be implemented. For instance, information in the spatial 
domain may be represented by a standard Gaussian Pyramid (ref ) , 

10 with the IDLE encoder algorithm operating on variable image 
size depending on the particular applications ' s capacity and 
needs. The encoder Interpreter parts for widening, extending or 
deepening do not have to be fully realtime for each frame. The 
complexity of the scenes and size of image then defines the 

15 compression ratios and coding qualities which may be attained. 

The present invention may also be used in remote 
camera surveillance. By ert^loying a remote real-time encoder 
at the source of the image information, both interpretation and 
transmission of the camera data is simplified. The general 

20 blush factors model normal systematic variations such as 
various normal illumination changes, while general smile 
factors and nod factors correct for normal movements (e.g., 
moving branches of a tree) . The automatic outlier detection and 
spatial mode;l extender detect systematic redundancies in the 

25 unmodelled residuals and generate new holons which in turn may 
be interpreted by searching in a data base of objects before 
automatic error warnings are issued. Each object in the data 
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base may have its own smile, blush and probability factor 
loadings and/or movement model. The compressed parameters may 
be stored or transmitted over narrow bandwidth systems, e.g., 
twisted-pair copper telephone wire transmission of TV camera 
5 output from security cameras in banks etc, or over extremely 
narrow bandwidth systems, such as are found in deep water or 
outer space transmission. 

Images from technical cameras, i.e., images not 
intended for direct human visualization may also be modeled/co- 

10 mpressed using the IDLE technique. The more 'color' -channels, 
the more effective the meta- model ling compression of the 
spatial IDLE models. Examples of this application include 
multi- wavelength channel camera systems used to monitor biolog- 
ical processes in the Near Infrared (NIR) , or Ultra-Violet/Vis- 

15 ible wavelength ranges (e.g., for recording fluorescence).. 

The IDLE system may also be used in conjunction with 
multichannel satellites and/or aerial photography. Repeated 
imaging of the same geographical area under different circxim- 
stances and at different times may be modelled by IDLE encod- 

2 0 ing. Such parameterization allows effective compression for 

storage and transmission. It also provides effective interpre- 
tation tools indicating the systematic intensity variations and 
movements, and how they change over time. If the same geo- 
graphical area is imaged from slightly different positions or 

25 under different measuring conditions, then an extra IDLE 

preprocessing model may be used for improved alignment, allow- 
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ing the geographical area to differ quite significantly (e.g. 
more or less day- light) and yet allow accurate identification • 
The IDLE approach of the present invention may also 
be utilized in cross domain coordination or lip synch applica- 
5 tions for movie production and in sound dubbing. For multivar- 
iate calibration, the temporal parameter scores from an IDLE 
video model of the mouth region of talking persons are related 
to the teirqporal parameters for a speech sound model (e.g* a 
subband or a Celp codec, or an IDLE sound codec) , e.g. by PLS2 

10 regression. This regression modelling may be based on data 
from a set of movie sequences of people speaking with various 
known image/sound synchronizations, thus modelling the local 
lip synch delay for optimizing the lip-sound synchronization. 
For each new sequence with lip synch problems, the same image 

15 and sound model score parameters are estimated. Once estimat- 
ed, this local lip synch delay is corrected or compensated for 
by modifying the temporal IDLE parameters and/or sound parame- 
ters . 

The IDLE principle may also be applied to database 
20 compression and/or searching. There are many databases in 
which the records are related to each other, but these rela- 
tionships are somewhat complicated and difficult to express by 
conventional modelling. Examples of this type of application 
include police photographs of human faces ( "mugshots" ) , various 
25 medical images, e.g., MRI body scans, photographs of biological 
specimens, photographs of cars etc. In such cases, the content 
of the database can be analyzed and stored utilizing IDLE model 
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parameters. The IDLE representation of related, but complicat- 
ed information in a database offers several advantages, viz., 
high compression, improved searchability and improved flexibil- 
ity with r^espect to the representation of the individual 
5 records in the database. The compression which may be achieved 
depends on how many records can be modelled and how simple the 
IDLE model which is used, i.e., the size and complexity of the 
database content. 

The improved searchability (and interpretability) 

10 stems from the fact that the data base search in the case of 
IDLE representation may be performed using the low-dimensional 
set of parameters corresponding to factor scores (e.g., a low 
number of nod, smile and blush scores) , as opposed to the large 
amount of original input data (e.g., 200,000 pixels per image). 

15 Compression techniques using fractals or DCT do not yield 

similar searchable parameters. The few IDLE score variables 
may in turn be related statistically to external variables in 
the database, providing the capability to search for larger, 
general patterns, e.g. in the case of medical images and 

20 medical treatments. The improved flexibility due to the 

representation of the records in the database stems from the 
fact that the bilinear IDLE factors allow whatever flexibility 
is desired. Equipping the holon models with a few smile and 
blush factors allows systematic unknown variations to be 

25 quantified during the pattern recognition without statistical 
overparameterizat ion . 
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The use of IDLE modelling in database representation 
may be used for a variety of record types in databases, such as 
image databasias containing human faces, e.g. medical, criminal; 
real estate promotional material; or technical drawings. In 
5 these situations, the IDLE modeling may allow multiple use of 
each holon in each drawing; the holons could in this special 
case be geometrical primitives. Additional applications 
include sound (music, voice) , events (spatiotemporal patterns) , 
situations (e.g., weather situations which combine various 

10 meteorological data for various weather structures or geograph- 
ical locations, for a certain time -span) . 

The IDLE principle may also be used for improved 
pattern recognition. In matching unknown records against 
various known patterns, added flexibility is obtained when the 

15 known patterns also include a few smile and blush factor 

loadings whose scores are estimated during the matching pro- 
cess. In searching an input image for the presence of a given 
pattern, added flexibility is obtained by allowing the holons 
to include a few smile and blush loadings, whose scores are 

20 estimated during the searching process. This type of pattern 
recognition approach may be applied to speech recognition. 

The IDLE principle may also be applied to medical and 
industrial imaging devices, such as ultrasound, MRI, CT etc in 
order to provide noise filtering, automatic warnings, and 

25 improved interpretation. In medical ultrasound imaging, noise 
is a major problem. The noise is so strong that filtering on 
individual fraimes to reduce the noise will often also destroy 
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important parts of the wanted signal. Much of the noise is 
random and additive with an expectation of zero, and if many- 
samples could be collected from the same part of the same 
object, then the noise could be reduced by averaging samples, 
5 It is often impossible to keep the measured object or siibject 
steady, and the observed movement can- seem to be quite complex. 
However, the observed movement is due to a limited number of 
reasons, and so the displacements will need relatively few IDLE 
smile and nod factors* In the reference position, noise can be 

10 averaged away. The smile and blush factors can also be useful 
for interpreting such sequences- Finally, ultrasound sequences 
represent such large amounts of raw data that they are diffi- 
cult to store. Most often only one or a few still images are 
stored. The compression aspect of the present invention is 

15 therefore highly applicable. 

The IDLE principle of the present invention may also 
be used for credit card and other image data base compression 
applications. For exaitple, in the case of compression, whenev- 
er there are sets of images with similar features, this set of 

20 images could be regarded as a sequence and cort^ressed with the 
IDLE technique. This is readily applicable to databases of 
facial images. If all the loads are }cnown at both the encoder 
and the decoder side, this means that only the scores need to 
be stored for each individual. These scores would then be able 

25 to fit into the storage capacity of the magnetic stripe on a 
credit card, and so could form the basis for an authentication 
system. 
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Other applications of the IDLE principle include 
still image compression, radar (noise filtering, pattern 
recognition, and error warnings) , automatic dynamic visual art 
(in an art galleary or for advertisement, two or more computers 
5 with e.g. flat color LCD screens where the output from IDLE 
models are shown. The score parameters of the IDLE model on 
one computer are functions of the screen output of the other 
IDLE models, plus other sensors in a self -organizing system) , 
consumer products or advertisement (one computer with e.g., a 

10 color flat LCD screen displays output from an IDLE model whose 
scores and loadings are affected by a combination of random 
number generators and viewer behavior) , disjoint sensing & 
meta- observation (when a moving scene has been characterized by 
different imaging sensors at sufficiently different times such 

15 that the images cannot be simply superimposed, IDLE modelling 
may be used to normalize the moving scene for simpler superim- 
position) . 

The IDLE system may also be used for data storage 
device normalization (magnetic, optical) . Specifically, if the 

20 physical positioning or field intensity of the writing process 
varies, or the reading process or the medium itself is varying 
and difficult to model and correct for by conventional model- 
ling, IDLE modelling using nod, smile and/or blush factors may 
correct for systematic, but unknown variations. This may be 

25 particularly critical for controlling multilayer read/write 

processes. In such an application, the already written layers 
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may serve as input data for the stabilizing latent smile and 
blush factors . 

The IDLE principle of the present invention also has 
numerous sound applications. For example sound, such as music, 
5 voice or electromechanical vibrations, may be modelled and 

compressed utilizing parameterization by fixed translation/nod, 
systematic shift/smile, intensity/blush and overlap/opacity in 
the various domains (e«g., time, frequency)- A holon in sound 
may be a connected sound pattern in the time and/or frequency 

10 domains. Additional sound applications include sound modifica- 
tion/editing; industrial process and monitoring, automotive, 
ships, aircraft. Also, searching may be carried out in sound 
data bases (similar to searching in image or video databases 
discussed above) . It is thus possible to combine IDLE model - 

15 ling in different domains, such as sound modelling both in. the 
time and the frequency domains. 

The IDLE principle may also be used in weather 
forecasting; machinery (robot quality control monitoring using 
a camera as a totally independent sensor and allowing the IDLE 

20 system to learn its normal motions and warn for wear & tear and 
abnormal behavior) ; robot modelling which combines classical 
robot connectivity "hard" nod trees with IDLE smile models for 
"softly" defined movements and using such "soft" and "hard" 
robot modelling in conjunction with blush factors to model 

25 human body motion. 
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The IDLE principle of the present invention may also 
be used for forensic research in the areas of finger prints, 
voice prints, and mug shot images. 

While the invention has been particularly shown and 
5 described with reference to preferred embodiments thereof, it 
will be understood by those skilled in the art that various 
changes in form and detail may be made therein without depart- 
ing from the spirit and scope of the invention* 
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DECODER - APPEND IX 

1 . Overview 

2 . Frame Reconstruction 

5 2.1 Intuitive explanation 

2.2 INRec Formula 

2.3 Holonwise loading- score matrix multiplication 

2.4 Smile 

2.5 Nod 
10 2.6 Move 

2.7 Ad hoc residuals 

3 . References 
1 . Overview 

In order to increase readability, colloquial abbrevi- 
15 ations are used in this description instead of the indexed and 
subscripted symbolism used elsewhere in the application. 

The decoder performs the following steps for each 
frame n: 

Receives updates of the segmentation S field part 
20 of domain B^^fi S 

Receives updates of the scores ("Sco") for the blush 
intensity changes ("Blu"), BluSco; the vertical and horizontal 
address smile changes ("Smi"), SmiSco; the 3D depth changes 
(Z) , ZSco; and probabilistic changes ("Prob"), ProbSco for 
25 for each hoi on. 
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Receives updates of the Blush, Smile, Prob and Z 
loadings for X^^f (abbreviated "Loads" or "Lod") : BluLod, 
SrriiLiod, ProbLod, ZLod. 

Receives updates of the affine transformation ("Nod") 
5 matrices, NodMat, containing the nod scores. 

Receives optional error residuals ("Res") = (BluR- 
es, SmiRes, ZRes, ProbRes) ♦ 

Reconstructs the intensity of the present frame (i^, 
here termed IN) based on the S field, scores, loads and Nod 
10 matrices, to produce a reconstructed i^hat result ("INRec") • 



2 - Frame Reconstruction 
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A. Intuitive explanation 

Blush the image by changing the pixel 
intensities of the pixels at the various color chan- 
nels in the reference image according to the blush 
5 factors . 

Smile the image by changing the address 
values of the pixels in the reference image according 
to the smile factors (including the Z factors) . 

Change the probabilistic properties of the 
10 image by changing the probabilistic suboperands like 

transparancies in the reference image according to 
the prob factors. 

Nod the smiled coordinates by changing the 
smiled adresses of the pixels according to nod matri- 
15 ces . 

Move the pixels from the blushed reference 
image into the finished image so that each pixel ends 
up at its smiled and nodded coordinates, so that 
"holes" in the image are filled with interpolated 
20 values, so that the pixel with the highest Z value 

"wins" in the cases where several pixels end up at 
the same coordinates, and so that pixels are partly 
transparant if they have a Prob value lower than 1- 
Optional: Add residual corrections to the 
25 reconstructed intensities. 

Optional: Post process the resulting output 
image to provide smooth blending of holons, especial- 
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ly along edges fonrmed during the Mover operator due 
to movements. In the preferred embodiment, this is 
accomplished by blurring along all segment edges in 
the moved images. 
5 2.2 INRec Formula 

The foirmula for computing INRec is as follows: 
INRec « Move (IRef +BluSco*BluLod, S, ... 

Nod{[VH] + SmiSco*SmiLod, Z+ZSco*ZLod, NodMat, 

S ) , • • • 

1 0 Pr obS CO * Pr obLod ) 



2.3. Holonwise loading-score matrix multiplication 

In an expression such as "BluSco*BluLod" , the 
multiplication does not imply traditional matrix multipli- 

15 cation, but rather a variation referred to as holonwise 

loading- score matrix multiplication. That is, each holon 
has its own score, and for each pixel, the S field must be 
analyzed in order to determine which holon that pixel 
belongs to, and this holon number must be used to select 

20 the correct score from BluSco. 

To compute BluSco*BluLod: 
For each Pixel: 
Sum=0 

25 For each Factor: 

Sum = Sum + BluSco [S [Pixel] , Factor] * BluL- 
od [Factor, Pixel] 
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Result [Pixel] = Sum 



The same applies to SmiSco*SmiLod, ZSco*ZLod and 



ProbSco*ProbLod, 



2*4 Smile 



Smiling pixels means to displace the reference 



position coordinates according to address change field. 



10 



15 



20 



The address change field may have values in each coordi- 
nate dimension, such as vertical, horizontal and depth 
dimension (V,H,Z), and may be defined for one or more 
holons. Each address change field may be generated as the 
sum of contribution of smile factors, and each change 
factor contribution may be the product of temporal scores 
and spatial loadings. 

In order to displace information of pixels away from 
the reference position, the amount of motion that each of these 
pixels in the reference position (the address change field 
DA^^f,,) may be con5>uted first, and the actual moving operation 
then takes place later in the Mover stage of the decoder. 



For each pixel with coordinates V, H, Z in the refer 
ence position, its new address after it has been 
moved is computed by: 

VSmi = V + SmiScoV*SmiLodV 
HSmi = H + SmiScoH*SmiLodH 
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ZSmil = Z + SiniSco2*SniiLodZ 
In these three expressions, V and H are the coordi- 
nate of each pixel in the reference position, while Z is 
the value of the Z field for that pixel. The multipli- 
5 cation is Holonwise loading- score matrix multiplication, 

as defined in the previous paragraph. 

2 . 5 Nod 

The function of the Nod is to modify the values 
10 of the coordinates of each pixel, which may be conceptual- 

ized as a vector having homogenous coordinates: 
ASmi = [VSmiled HSmiled ZSmiled 1] 
The nodded coordinates, ANod are then given by: 

r- -1 r" -1 r- -| 

I VNod I I Til T12 T13 0 | | VSmi | 

I HNod I = I T21 T22 T23 0 1*1 HSmi | 

I ZNod I I T31 T32 T33 0 | | ZSmi [ 

I Dummy I | T41 T42 T43 1 | | 1 | 

L- -J L_ _J L- _J 

which may be equivaiently expressed as: 
ANod = NodMat * ASmi 

25 2.6 Move 

Move the pixels into the finished image so that 
each pixel ends up at its smiled and nodded coordinates. 



15 



20 
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in such a way that "holes" in the image are filled with 
interpolated values, and that the pixel with the highest Z 
value "wins" in the cases where several pixels end up at 
the same coordinates, and so that pixels are partly trans- 
5 parant if they have a Prob value lower than 1. 

If the loadings X (f ) Ref/ f =1/ 2 , . , . are also moved 
together with the level 0 image, X(0)R,f, the same interpo- 
lation and Z buffering strategies are used for f=l,2,.., 
as for f=0 above. 

10 A description of methods of moving and interpo- 

lating pixels may be found in, e.g. , George Wolberg, 
Digital Image Warping . Chapter 1, (IEEE Computer Society 
Press 1990) , which is incorporated herein by reference. A 
description of Z-buffering may be found in, e.g., William 

15 A. Newman and Robert F. Sproull, Principles of Interactive 

Computer Graphics , Chapter 24 (McGraw Hill 1984) , which is 
incorporated herein by reference. A description of how to 
combine partly transparent pixels may be found in, e.g., 
John Y.A. Wang and Edward H. Adelson, "Layered Representa- 

20 tion for Image Sequence Coding", IEEE ICASSP, Vol. 5, pp. 

221-224, Minneapolis, Minnesota, 1993, which is incorpo- 
rated herein by reference. 




I 
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Appendix MERGE_SUBSEQUENCES 

Check if the present subsequence model can be merged with 
other subsequence models 

A. Call the present reference model 'position I', and 
another reference model 'position II' . Move the 
spatial model parameters of the extended reference 
image for the present subsequence, Xj, to the posi- 
tion of the extended reference image for another 
subsequence, Xn, using a frame n which has been mod- 
elled by both of the subsequences: 

1. Since: 

In Model I : i^hat(I) =Move(DAi^« of I, +DIi,J 
In Model 11: i^hat (II) -Move (DAn^ of In+DIn^) 

and this generalizes from i„hat to all domains in 

x^hat : 

In Model I : x„hat(I) -Move(DAi^„ of X, +DXi,^) 
In Model II: x^hat (II) =Move (DAn^ of Xn+DXn^j 

2 . We can move the estiinate for frame n back to the 
two respective reference positions: 

In Model I : X^hat(I)@i =Move(DA„,i of x„) 
In Model II: X„hat (II) @n=Move (DA^^n of x^) 
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If the two models mainly contain smile, as op- 
posed to blush modelling, we may now ' move 
model I to frame n's estimated position, using 
model I, and then move model I into model II' s 
position using the reverse of model II: 



Xi@n= Move(DJ^^n of(Move(DAi^ of (Xj+DXj^) ) 



The obtained model I loadings given in model 
II' s position, X,(^n, may now be compared to and 
merged into Xu, (with local smile and blush 
estimation and model extension, plus detection 
of parts in Xj lost in X,^n . This yield a new 
and enlarged model that summarizes both mod- 
els I and II • 

The new and enlarged model Xn may now similarly 
be merged with another model III with which is 
has another overlapping frame, etc. Subsequences 
are merged together as long as it does not in- 
volve unacceptable degradation in compression 
and/or reproduction quality* 
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APPENDIX SIMPLIFIED ENCODER 



Purpose: 

5 Show one way of implementing a simplified IDLE encoder • 



Contents : 



1 EncSeq • 2 

10 

2 ExpressSubSeqWithModels 5 

3 ExpressWithModels 6 

15 4 ExtractSmiFactSiibSeq 8 

5 ExtractBluFactSubSeq 11 

6 SegSubSeq . . ♦ 13 

20 

7 AllocateHolon 16 

8 MoveBack » • . ♦ 17 

25 9 AnalyseMove 18 

10 Other rec[uired methods 20 
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10.1 Move 20 

10.3 Smi2Nod 20 

10.4 UpdateModel 21 

10.5 Transmit 21 
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1 EncSeq 



Input : 

5 Seq: Sequence of frames; one per row 

ErrTol: Error tolerance 



Output : 

SmiLod: Smile loads 
10 SmiSco: Smile scores 

BluLod: Blush loads 
BluSco: Blush scores 

Inf oimal description: 

15 Work forward through the sequence. Whenever frames 

cannot be reconstructed with an error less than the 
tolerance using known smile and blush factors, intro- 
duce a new factor. Do this by first trying to intro- 
duce a smile factor and then trying to introduce a 

20 blush factor. Choose the factor that improved the 

reconstruction the most. 

During this process, different parts of the image may 
be found to move independently of or occluding each 
25 other. Each time this is detected, detect which 

parts of the image move coherently, isolate the 
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smallest and define this as one or more new holons, 
make new room by increasing the size of the image, 
place the new holons there, and let a smile factor 
compensate for this repositioning. 

5 

Whenever new information is revealed (That is, parts 
of the image cannot be moved back to reference posi- 
tion with any fidelity using the existing nod or 
smile factors) , find which holons are nearby and try 
10 to model the new information under the assumption 

that it is an extension to each of these holons. If 
a good modelling behaviour can be found, extend the 
holon, else create a new holon. 



15 Take into account how much memory the decoder has 

left: 

If it has much free memoirir, prefer factors that 
span many frames and so are believed to be more 
"correct" (even though they alone may describe 

20 each individual frame with less fidelity) by 

relaxing the test error tolerance TestErrTol. 
Xf it has little free memory, it is important 
that the required fidelity must be reached with 
the few remaining factors, so the test error 

25 tolerance TestErrTol must be tightened* 
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Method: 

IRef = First image in the sequence Seq 
Set SmiLod and BluLod to empty 



While NextFraNo <= length (Seq) 

[SmiSco, BluSco, FailFraNo] = 

ExpressSubSeqWithModels {Seq, NextFraNo, 
10 IRef , SmiLod , BluLod , ErrTol ) 

If FailFraNo length (Seq) : 

Try different ways of updating the model: 

15 

If the decoder has much memory left 
(Based on Transmit history) : 

Set TestErrTol to a large value 
else if the decoder has little memory 
20 left: 

Set TestErrTol to a value close 
to ErrTol 



25 



FromFraNo = FailFraNo 

[NewSmiLod/ nSmiFra, TotSmiErr] = 
ExtractSmiFactS\ibSeq{Seq, FromFraNo, 
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TestErrTol, SmiLod, BluLod, SmiSco, 
BluSco) 

[NewBluLod , nBluPra , TotBluErr ] = 
ExtractBluFactSubSeq(Seq, FrortiFraNo, 
TestErrTol, SmiLod, BluLod, SmiSco, 
BluSco) 

[News, nSegFra, TotSegErr] = SegSubSe- 
q (Seq, FromFraNo, S , TestErrTol) 

Based on nSmiFra, nBluFra and nSegFra, and TotS- 

miErr, TotBluErr and TotSegErr: 

Either select one of Smile or Blush to be 
included in the model, or change the seg- 
mentation 

If Smile is selected: 

Transmit (SmiLod) 
Update smile factors: 

[SmiLod, SmiSco] = UpdateModel (SmiLod, - 
SmiSco, NewSmiLod) 
else if Blush is selected: 
Transmit (BluLod) 
Update blush factors: 
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[BluLod, BluSco] = UpdateModel (BluLod, - 
BluSco, NewBluLod) 
else Segment is selcted: 
Transmi t ( NewS - S ) 



S = 



News 



End of method EncSeq 



wo 95/08240 PCTAUS94/10190 

148 

2 ExpressSubSeqWithModels 



Purpose : 

Express a Sequence with existing models consisting of 
loads in smile and blush domain, as far as the error 
tolerance will allow. 



[SmiSco, BluSco, NextFraNo] ... 

ExpressSubSeqWithModels (Seq, NextFraNo, 
ErrTol, IRef, SmiLod, BluLod, SmiSco, BluS- 

CO) 

Input : 

Seq: The sequence to be expressed 

NextFraNo: Starting point of the subsequence within Seq 

ErrTol: Error tolerance; the ending criterion for the 

subsequence 

IRe f : Re f e r en c e image 

SmiLod, BluLod: Smile load 

SmiSco, BluSco: Already known smile and blush scores 

Output : 

SmiSco: Smile scores 
BluSco: Blush scores 

FailFraNo: Number of the frame where the modelling failed 
due to ErrTol 
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Method: 

Set current frame number N to NextPraNo 
Repeat 
5 IN = Seq[N] 

Try to model IN using the known factors: 
[INRec, SxaiScoEN], BluSco [N] ] = 

ExpressWithModels (IN, S, SmiLod, BluLod) 
Increase the frame number N 
10 until Error (INRec, IN) < ErrTol or IN was the last frame in 

Seq 

NextPraNo N 



15 End of method RxpressSeqWithModels 



wo 95/08240 ^ PCTAJS94/I0190 

150 

3 ExpressWithModels 



Purpose : 

5 Express a frame with the known models, i.e. calculate the 

scores for the existing loads that gives best fit between 
IN and a reconstruction 

[INRec, SmiSco, BluSco] = ExpressWithModels (IN, IRef, SmiLod, 
10 BluLod, S, SmiSco, BluSco) 

Input : 

IN: One particular frame 
IRef : Reference image 
15 SmiLod: Known smile loads 

BluLod: Known blush loads 
S: S field 



Optional input: 

20 SmiSco, BluSco: Initial estimates for the smile and blush 

scores 



Output : 

INRec: Reconstructed image 
25 SmiSco: Improved estimates for the smile and blush scores 
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Informal description: 

Find an optimal set of scores by trial and error, i.e. by 
a search method like Simplex (For a description, see 
chapter 10.4, William H. Press, et al . , "Downhill Simplex 
Method in Multidimensions " in "Numerical Recipes" (Cam- 
bridge University Press) , which is incorporated herein by 
reference . 

Select new smile scores as variations of the previ- 
ously best known smile scores, estimate blush scores 
by moving the difference between the decoded and the 
wanted image back to reference position and then 
projecting on the existing blush loads. 

Judge how well each new image approximates the wanted 
image, and use this as guidelines for how to select 
new variations of the smile scores. 
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Method: 



For each holon: 
Repeat 

For a small niamber of variants: 

Change the smile scores slightly 

Decode an image using the new smile scores 

and the old blush scores 

Move the difference between the decoded and 
the wanted image back to reference position 
Project the difference onto blush loads, 
producing new BluSco 

Decode an image using the new SmiSco and 
BluSco 

Select the best variant (i.e. keep the scores 

that gave best reconstruction) 
until the reconstructed image is good enough or the 
reconstruction is not improving 



End of ExpressWithModels method 
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4 ExtractSmiFactSubSeq 

Purpose: 

Extract one smile factor from a sxobsequence 

[NewSmiLod, nSmiFra, TotSmiErr] = ExtractSmiFactSubSeq (Seq, 
FromFraNo, ErrTol, IRef, SmiLod, BluLod, SmiSco, BluSco) 



Input : 

10 Seq: The sequence 

FromFraNo : 

Number of first frame in subsequence. This is the 
same as NextFraNo in EncSeq 
ErrTol: Error tolerance 
15 SmiLod, BluLod: Known smile and blush loads 

SmiSco, BluSco: Scores to be updated 

Output: 

nSmiFra: Number of frames used for estimating the smile 
20 factor 

NewSmiLod: One new smile load 

TotSmiErr: Total remaining error after smiling 



Informal description: 
25 For each frame, as long as smile seems reasonable: 
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Reconstruct the wanted frame IN as well as possible 
using only the known loads; call this IM 
Find how IM should be smiled in order to look like IN 
Map this smile back to reference position 
5 UpdateModel 

Return the first factor of the final model 

Method : 

TestFraNo FromFraNo 
10 TotErrSmi = 0 

Set SmiTestLod to empty 

Repeat 

15 IN = Seq [TestFraNo] 

Establish an image IM that reconstructs IN as well as 
possible based on the reference image and known smile 
ajad blush factors, and as a side effect also compute 
the return field from M to Reference position: 
20 [IM,SmiSco [TestFraNo] , BluSco [TestFraNo] ] = 

ExpressWithModels (IN, IRef, SmiLod, BluLod, 
SmiScoInit, BluScoInit) 
SmiRefToM = SmiSco[M] * SmiLod 



25 



IM = Move ( IRef +BluSco[M] *BluLod, SmiSco [M] *SmiLod) 
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10 



15 



20 



Find how IM should be made to look like IN when only 
smiling is allowed, and at the same time calculate 
the Confidence of this smile field: 

[SmiMToN, SmiConfMToN] = EstMov(IM, IN, TestSmi- 



Move the smile and its certainity back to reference 
position: 

SmiMToNAtRef = MoveBack (SmiMToN, SmiRefToM) 
SmiConfMToNAtRef = MoveBack (SmiConfMToN, SmiRef- 
ToM) 

Calculate the error when only smiling is used: 

ErrSmi = IN - Move (IRef Blushed, SmiRefToM+ SmiMT- 
oNAtRef) 

[SmiTestLod, SmiTestSco] = . . • 

TotErrSmi = TotErrSmi + ErrSmi 

UpdateModel (SmiTestLod, SmiTestSco, ErrSmi) 

TotSmiConfMToNAtRef = TotSmiConfMToNAtRef + SmiConf- 
MToNAtRef 

TestFraNo = TestFraNo + 1 



Lod) 



until 
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10 



The energy is too much spread among the factors in 
SmiTestLod , or * 
ErrSmi is large 

The last frame should not be included in the summary, so: 
Undo the effect of the last UpdateModel 
Undo the effect of the last error summation: 
TotErrSmi = TotErrSmi - ErrSmi 

TotSmiConfMToNAtRef = TotSmiConfMToNAtRef - SmiConfMToNAt- 
Ref 

NewSmiLod = SmiTestLod [1] 
nSmiPra ^ FromFraNo - NextFraNo 



End of ExtractSmiFactSeq method 
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5 ExtractBluFactSubSeq 

Purpose: 

Extract one blush factor from a subsequence 

5 

[NewBluLod, nBluFra, TotBluErr] - ExtractBluFactSubSeq (Seq, 
NextFraNo, ErrTol, IRef, SmiLod, BluLod, SmiSco, BluSco) 

Input : 

10 Seq: The sequence 

NextFraNo: Number of next frame, i.e. start of subsequence 
ErrTol: Error tolerance, which may define end of subseque- 
nce 

IRef: Reference image 
15 SmiLod: Known smile load 

BluLod: Known blush loads 
SmiSco: Smile scores 
BluSco: Blush scores 

20 Output: 

NewBluLod: New blush load 

nBluFra: Number of frames for which this blush is defined 
TotBluErr: Total remaining error after blushing 



25 Method: 
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TotBluErr = 0 
TestFraNo = NextFraNo 
Set BluTestLod to empty 



Repeat 

If scores for IM are not available from ExtractSmiFa- 
ctSiibSeq: 

Establish an image IM that reconstructs IN as 
well as possible based on the reference image 
and known smile and blush factors, and as a side 
effect also compute the return field from M to 
Reference position: 

[IM, SmiSco [TestFraNo] , BluSco [TestFraNo] ] = 
ExpressWitliModelsdN, IRef, SmiLod, 
BluLod, SmiScoInit, BluScoInit) 
SmiRefToM = SmiScoM * SmiLod 

Try to malce IM look lilce IN by blushing: 

BluMToN = IN - IM 
Move this blush baclc to reference position: 

BluMToNAtRef = MoveBac]c(BluMToN, SmiRefToM) 
[BluTestLod, BluTestSco] = ... 

UpdateModel (BluTestLod, BluTestSco, ErrBlu) 

Calculate the error when only blushing is used: 

ErrBlu = IN - Move (IRef Blushed+BluMToNAtRef , 
SmiRefToM) 
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TotErrBlu = TotErrBlu + ErrBlu 
TestFraNo = TestFraNo + 1 

5 until 

The energy is too much spread out among factors in 

BluTestLod, or 

Svim( ErrBlu) is large 

10 The last frame should not be included in the summary, so 

Undo the effect of the last UpdateModel 
Undo the effect of the last error siimmation: 
TotErrBlu = TotErrBlu - ErrBlu 

15 NewBluLod = BluTestLod [1] 



End of ExtractBluFact method 



wo 95/08240 



PCT/US94/10190 



160 

6 SegSubSeq 

Purpose : 

Propose a new segmentation of the holons, and report how 
much this improves the modelling 

[S, TotSegErr,nSegFral = SegSubSeq (Seq, FromFraNo, SmiLod, 
SmiSco, S) 

Input: 

Smi: Smile field 

FromFraNo: N\imber of first frame in the subsequence 
SmiLod: Smile loads 
SmiSco: Smile scores 
S: Previous S field 

Output: 

S: New, updated S field 

TotSegErr: Total error associated with segmenting 
nSegFra: Number of frames used for estimating the segmen- 
tation 

Informal description: 

Use various heuristic techniques to improve how the refer- 
ence image is split into separate holons. 
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Check how easy it is to extract either new smile or new 
blush factors under the assumption of this new split. 
Report back the best result. 



5 



Method : 



Repeat 

TestFraNo = FroitiFraNo 
10 Repeat 

IN = Seq (TestFraNo) 



Smi = SmiSco [TestFraNo] * SmiLod 



15 



Split one holon into two if necessary: 
For each holon in S : 

Compute a nod matrix from Smi for that 
holon 

20 If the sum of errors between nod ma- 

trices and pels is large: 

Split each holon along the prin- 
cipal component of the errors 



25 



Join two holons into one if necessary: 
For each holon in S: 
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If the nod matrix is very similar to 



the nod matrix of a.nother holon: 



Join the two holons 



Let edge pels with bad fit change holon: 

INRec = Move (IRef +BluSco*BluLod, SmiSco*Sm- 
iLod) 

For each pel, at position v,h, in INRec 
that is on the edge of a holon: 



Pick up pels that don't belong to any holon: 
VisInFromAtTo = AnalyseMove (Smi) 
Make a new holon out of pels whereVisInFro- 
mAtTo [pel] <Thereshold 

TestFraNo = TestFraNo + 1 



until SmiSco [TestFraNo] is no longer available from 
earlier runs of ExtractSmiFactSubSeq 



10 



If the pel fits better on the neighbo- 
uring holon, let the pel belong to the 
neighbouring holon 



20 



until convergence 
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[NewSmiLod, nSmxFra, TotSmiErr] = ExtractSmiFactSubSeq {Se- 
q, FromFraNo, TestErrTol, SmiLod, BluLod, SmiSco, BluSco) 

[NewBluLod, nBluFra, TotBluErr] = ExtractBluFactSubSeq (Se- 
5 q, FromFraNo, TestErrTol, SmiLod, BluLod, SmiSco, BluSco) 

If Smile was "better" than Blush: 
TotSegErr - TotSmiErr 
nSegFra = nBluFra 

10 else 

TotSegErr = TotBluErr 
nSegFra = nBluFra 



End of SegSubSeq method 
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Purpose: 

SegSubSeq will need to change the spatial definition of 
holons . Here is one exairple of an operation that is 
needed, namely the one to allocate a new new holon in the 
Reference image . 

[S, SmiLod, BluLod, SmiSco, BluSco] = AllocateHolon (S , SNewHol- 
on, Smi, SmiLod, BluLod, SmiSco, BluSco) 

Input: 

S: Old S field, before updating 

SNewHolon: S field for one or mory new holons 

Output : 

S: New, updated S field 

Method: 

For each new holon in S: 

Find enough free space in S, if necessary increase 
the size of S 

Find a free holon number, put this into each new pel 
position in S 

Put the pels of SNewHolon into the new space 
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Give the new holon a new smile factor capable of 
moving the holon from the new reference position back 
to its last position 

Reformat the score tables accordingly 
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8 MoveBack 



Purpose : 

Move the contents of an image back, e.g. from N to M 
5 position or from M to Ref position. This is almost an 

inverse of Move* 



IBack = MoveBack ( lOut , SmiBack, SOut) 



10 Input: 

lOut: Input image, in Moved Out position, e.g. IM 
SmiBack: Smile field, in Back position, e.g. Ref 
SBack: S field, in Back position 

15 Output : 

IBack: Image moved back, e.g. to reference position 
Method: 

For each pel at position v,h in SBack: 
20 Interpolate / using two-way linear interpolation, 

IBack [v,h] from the four pels in lOut that surrounds 
the sxib-pixel position (v+SmiV[v,h] , h+SmiH[v,h]) 



wo 95/08240 




PCTAJS94/10190 



167 

9 AnalyseMove 



Purpose: 

Determine features of a smile field: 
5 For each pel in a From image: Will it be visible in 

the To image ? 

For each pel in a To image: Was it visible in the 
From image ? 

10 [VisInToAtFrom, VisInFromAtTo] = AnalyseMove (Smi From, S From) 

Input : 

SmiFrom: Smile field, in From position, to be analyzed 
SFrom: S field, in From position 

15 

Output : 

VisInToAtFrom: Visibility in To image at From position: 
For each pel in a From image: 

1 if the corresponding pel in the To image is 
20 visible 

0 otherwise 

VisInFromAtTo: Visibility in in the From image at To 
position: 

For each pel in a To image : • 
25 1 if the corresponding pel in the From image is 

visible 
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0 otheirwise 



Method: 

5 Generate VisInFromAtTo: 

Initialize VisTo to all zeros 

For each pel, at position v,h, in SmiProm: 

VisInFromAtTo [int (v+SiniV[v, h] ) , int (h+SmiH [v, h] - 
)] - 1 

10 

For each pel, at position v,h, in VisInFromAtTo: 
Replace VisInFromAtTo [v,h] with the majority 
value of itself and its four neighbours 

15 Generate VisInFromAtTo: 

[Dummy2, SmiRet] Move(Dummyl, Smi) 

Initialize VisFrom to all zeros 

For each pel, at position v,h, in SmiRet: 

Vis InToAt From [int (v+SmiRetV [v, h] ) , int (h+SmiRet- 

20 H [v,h] ) ] = 1 



25 



For each pel, at position v,h, in Vis InToAt From: 

Replace VisInToAtFrom with the majority value of 
itself and its four neighbours 
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10 Other required methods 



10 



15 



20 



10.1 Move 

Purpose: Move the contents of an image according to a 
Smile field 

[IMoved, Ret] = Move(IFrom, Smi, S) 
as described in . . , 



10.2 EstMov 
Purpose: 

Estimate the movement (i.e. Smile field) from one 
frame to another, together with the certainity of the 
estimate 

[Smi, SmiConf] = EstMov (IFrom, ITo) 



25 



Input : 

I Prom: From- image 
ITo: To- image 
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Output : 

Smi : Smile field 

SmiConf : Smile confidence: How sure can we be on Smi 



Method: 

E-g. any of those methods described in "Optic Flow 
Computation, A Unified Perspective", Ajit Singh, IEEE 
Computer Siciety Press 1991, ISBN 0-8186-2602, which 
10 uses the teim "Optical flow field" much like a Smile 

field is used in this context. 



15 



10,3 Smi2Nod 



Purpose: Conqpute Nod matrices from Smile fields 



NodMat = Smi2Nod(Smi, S) 



20 as described in 



. 10.4 UpdateModel 
25 [NewLod, NewSco] = UpdateModel (01 dLod, OldSco, NewData) 



as described in 
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10.5 Transmit 
Purpose : 

Make the computed data available for the decoder so 
it can decode the sequence 

Transmit (Data) 

Method: 

If Data is a spatial load: 

Compress Data using conventional still image 

coirqpression techniques 
else if Data is an update of an S field: 

Compress Data using run- length encoding 
else if Data represents scores: 

Compress Data using time series compression 

techniques 

Send Data to the receiver via whatever communication 
medium has been selected 
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Appendix 



Notation 



10 



15 



20 



= (Equals sign) : 

The expression to the left of the sign is evalu- 
ated, and the result is assigned to the variable 
or structure indicated by the identifier to the 
right of the sign. 

If the expression to the left results in several 
output values, a corresponding list of identifi- 
ers are given inside brackets on the right side 
of the sign. 

() (Parenthesis) : 

After an identifier, a pair of parenthesis indi- 
cates that the identifier indicates a defined 
function to be evaluated or executed, and the 
identifiers given inside the paranthesis repre- 
sent variables or structures that are sent to 
the function as input parameters. 

[] (Square brackets) : 
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One use of square brackets is defined in the 
paragraph about the Equals sign. 

Another use is to indicate indexing: When a pair 
of square brackets appear after an identifier, 
5 this means that the identifier refers to an 

array or matrix of values, and the e3<pression 
inside the square brackets selects one of those 
values . 

10 Naming 



Mnemonic names are used: 

"Smi" is used instead of "DA" for Smile 
"Blu" is used instead of "DI" for Blush 
15 "Lod" denotes loads 

"SCO" is used instead of "U" for scores 



20 



Pre- and postfixes are used instead of subscripts, and 
bold characters are not used, e.g. 

"SmiMToN" is used instead of DA^. 
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We Claim; 



1. 



A method for converting samples of an input 



signal to an encoded signal composed of a plurality of compo- 
nent signals each representing a characteristic of the input 
signal in a different domain, said input signal being comprised 
of data samples organized into records of multiple samples, 
with each sample occupying a unique position within its record, 
characterized in that each component signal is formed as the 
combination of a plurality of factors, each factor being the 
product of a score signal and a load signal, the score signal 
defining the variation of data samples from record to record 
and the load signal defining the relative variation of a 
subgroup of samples in different positions of a record. 



set of reference component signal values is provided which 
represents a reference pattern of samples and in each record 
the input signal is represented by a plurality of component 



2. 



The method in accordance with claim 1 wherein a 
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change signal values for each record, each component change 
signal value being equal to the difference between reference 
pattern of samples and the record. 

3 . The method of claim 2 wherein each record has 
the same niamber of sair5)les arranged in a multi- dimensional 
array, a first of said coitponent signals representing the 
magnitude of samples and a second of said component signals 
representing the 

position of a sample in the array. 

4 . The method of claim 3 wherein a component 
change signal may result in several pixels of the reference 
image being mapped to a common pixel of one of the frames, the 
intensity of the common pixel 'being equal to a weighted sum of 
the intensities of the several pixels. 
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The method of claim 1 wherein at least one of a 



set of load signals and a set of score signals is selected for 
each component signal so as to be statistically representative 
of variations in the corresponding characteristic among all 



factors and the precision of factors are selected so that the 
storage space required therefor will not exceed a predefined 
amount . 

7, The method of claim 3 further comprising 
providing a plurality of error signals each corresponding to 
one of the component signals, each error signal providing 
correction to the extent that the corresponding component 
signal does not represent the corresponding characteristic of 
the input signal within a predefined range. 



records . 



The method of claim 3 wherein the number of 
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8. The method of claim 7 wherein the number of 
factors and the precision of factors is selected to achieve 
error signals which remain below a predefined threshold value. 



factors and the precision of factors are selected so that the 
storage space required therefor will not exceed a predefined 
amount . 



ing a plurality of error signals each corresponding to one of 
the component signals, each error signal providing correction 
to the extent that the corresponding component signal does not 
represent the corresponding characteristic of the input signal 
within a predefined range. 



9. 



The method of claim 8 wherein the nvimber of 



10. The method of claim 1 further comprising provid- 



11. The method in accordance with claim 10 wherein a 
set of reference component signal values is provided which 
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represents a reference pattern of samples and in each record 
the input signal is represented by a plurality of component 
change signal values for each record, each component change 
signal value being equal to the difference between reference 
pattern of samples and the record. 



12 . . The method of claim 1 wherein each record has 
the same number of samples arranged in a multi- dimensional 
array, a first of said component signals representing the 
magnitude of samples and a second of said component signals 
representing the position of a sample in the array. 

13. The method of claim 12 wherein a component 
change signal may result in several pixels of the reference 
image being mapped to a common pixel of one of the frames, the 
intensity of the common pixel being equal to the sum of the 
intensities of the several pixels. 
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14. The method of claim 12 wherein the input signal 
is a conventional video signal, each sample is a pixel of a 
video image, each record is a frame of video, said first 
component signal represents pixel intensity and said second 
component signal represents the location of a pixel in a frame. 

15. The method of claim 14 further comprising 
providing a plurality of error signals each corresponding to 
one of the component signals, each error signal providing 
coExfeetitotih6t) the corresponding component signal does not repre 
sent the corresponding characteristic of the input signal 
within a predefined range. 

16. The method in accordance with claim 1 wherein a 
set of reference component signal values is provided which 
represents a reference pattern of samples and in each record 
the input signal is represented by a plurality of component 
change signal values for each record, each component change 
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signal value being equal to the difference between reference 
pattern of samples and the record, 

17. The method of claim 16 wherein a component 
change signal may result in several pixels of the reference 
image being mapped to a common pixel of one of the frames, the 
intensity of the common pixel being equal to a weighted sum of 
the intensities of the several pixels, 

18- The method of claim 16 wherein a component 
change signal may result in several pixels of the reference 
image being mapped to a common pixel of one of the frames, the 
intensity of the common pixel being equal to be the difference 
between a constant and the sum of the intensities of the 
several pixels. 

19, The method of claim 16 wherein a component 
change signal may result in several pixels of the reference 



wo 95/08240 




PCTAJS94yi0190 



181 

image being mapped to a common pixel of one of the frames, said 
method further comprising defining a depth for each of the 
several pixels, the intensity of the common pixel being made 
equal to the intensity of the pixel among the several pixels 
which has the least depth. 

20. The method of claim 19 wherein the depth of 
pixels is defined as a separate domain represented by a third 
component signal . 

21. The miethod of claim 16 wherein the reference 
image is provided with a collection of holons, the collection 
of holons containing every different holon appearing among all 
the frames of the input signal. 

22. The method of claim 21 wherein the location of a 
pixel within the reference image is represented in a first 
system of coordinates and the location of a pixel within at 
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least one of the holons is represented in a different system of 
coordinates , 

23. The method of claim 21 wherein the location of a 
pixel within different holons is represented in a different 
system of coordinates. 

24. The method of claim 21 wherein the holons 
include a set of pixels exhibiting coordinated behavior in at 
least one domain, and at least one of a load signal and score 
signal of at least one component signal operates only on said 
set of pixels. 

25. A method for producing a set of load and score 
signals for use in the method of claim 2 comprising the steps of; 

a. determining the plurality of component 
change signal values as the difference between each record and 
the reference pattern of samples; 
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b. performing principal component analysis on 
the plurality of component change signal values to extract a 
plurality of loads; 

c. projecting the plurality of component 
change signals values on the plurality of loads to produce a 
set of score values which are applied to the plurality of loads 
to produce an approximated record; 

d* determining the difference between each 
approximated record and each record; 

e. repeating steps c and d until the differ- 
ence between each approximated record and each record is less 
than a predetermined value* 

26. A method for producing a set of load and 
score signals for use in the method of claim 25, wherein the 
principal component analysis is a weighted principal component 
analysis. 
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27. 



A method for producing a set of load and 



score signals for use in the method of claim 16, comprising the 
further step of extending the set of reference component 
signals to include additional coit5)onent signals. 

# 

28. A method for decoding an encoded signal 
composed of a plurality of component signals in different 
domains to an input signal comprised of data samples organized 
into records of multiple samples, with each sample occupying a 
unique position within its record, said encoded signal repre- 
sented as a combination of a plurality of factors, each factor 
being the product of a score signal and a load signal, the 
score signal defining the variation of data samples from record 
to record and the load signal defining the relative variation 
of a sxibgroup of samples in different positions of a record, 
said method utilizing a reference pattern of samples, compris- 
ing the steps of: 
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a. multiplying each load signal by its associ- 
ated score signal to produce each factor; 

b. combining the factors produced in step a; 
c- modifying the set of reference component 

signal values according to the combined factors produced in 
step b to produce the records of a reproduced input signal. 



29. A method for decoding on encoded signal as in 
claim 28 wherein at least one of the load signals and score 
signals is provided on a storage mediiom. 



30. A method for decoding on encoded signal as in 
claim 28, wherein the reference component signal values are 
provided on the storage medium. 



31. A method for decoding an encoded signal as in 
claim 28 wherein the method comprises the further step of 
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receiving at least one of the load signals and score signals 
from a remote location over a communications medium. 

32. The method of claim 31 wherein the reference 
component signal values are also received over the communica- 
tions medium, 

« 

33. A method for editing an encoded signal 
composed of a plurality of coir5)onent signals in different 
domains to an input signal comprised of data samples organized 
into records of multiple samples, with each sample occupying a 
unique position within its record, said encoded signal repre- 
sented as a combination of a plurality of factors, each factor 
being the product of a score signal and a load signal, the 
score signal defining the variation of data samples from record 
to record and the load signal defining the relative variation 
of a sxibgroup of samples in different positions of a record. 
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said method utilizing a reference pattern of samples, compris- 
ing the steps of: 

a. modifying at least one score signal to 

achieve desired editing; 

b. multiplying each load signal by its associ- 
ated modified score signal to produce each factor; 

c. combining the factors produced in step a; 

d. modifying the set of reference component 
signal values according to the combined factors produced in 
step b to produce the records of a reproduced input signal, 

34. An apparatus for converting samples of an input 
signal to an encoded signal composed of a plurality of compo- 
nent signals each representing a characteristic of the input 
signal in a different domain, said input signal being comprised 
of data samples organized into records of multiple samples, 
with each sample occupying a unique position within its record, 
comprising means for encoding each record as a combination. 
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each component: signal of a plurality of factors, each factor 
being the product of a score signal and a load signal, the 
score signal defining the variation of data samples from record 
to record and the load signal defining the relative variation 
of a subgroup of samples in different positions of a record. 

35. The apparatus in accordance with claim 34 
further comprising means for generating a set of reference 
component signal values which represents a reference pattern of 
samples, means for producing for each record a plurality of 
component change signal values representing the input signal, 
each component change signal value being equal to the differ- 
ence between the reference pattern of samples and the record. 

36. The apparatus of claim 35 wherein each record 
has the same nxamber of samples arranged in a multi-dimensional 
array, a first of said component signals representing the 
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magnitude of samples and a second of said component signals 
representing the 

position of a sample in the array. 

37. The apparatus of claim 36 wherein a component 
change signal may result in several pixels of the reference 
image being mapped to a common pixel of one of the frames, 
further comprising means for causing the intensity of the 
common pixel to be equal to a weighted sum of the intensities 
of the several pixels. 

38. The apparatus of claim 36 further comprising 
means for providing a plurality of error signals each corre- 
sponding to one of the component signals, each error signal 
providing correction to the extent that the corresponding 
component signal does not represent the corresponding charac- 
teristic of the input signal within a predefined range. 
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39. The apparatus of claim 34 further comprising 



means for providing a plurality of error signals each corre- 
sponding to one of the component signals, each error signal 
providing correction to the extent that the corresponding 
component signal does not represent the corresponding charac- 
teristic of the input signal within a predefined range. 



further comprising means for generating a set of reference 
component signal values which represents a reference pattern of 
sarrqples, means for producing for each record a plurality of 
component change signal values representing the input signal, 
each component change signal value being equal to the differ- 
ence between the reference pattern of samples and the record. 

41. The apparatus of claim 34 wherein each record 
has the same number of samples arranged in a multi- dimensional 
array, said means for encoding causing a first of said compo- 



40, The apparatus in accordance with claim 34 
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nent signals representing the magnitude of samples and a second 
of said component signals representing the position of a sample 
in the array. 



42. The apparatus of claim 41 wherein the input 
signal is a conventional video signal, each sample is a pixel 
of a video image, each record is a frame of video, said first 
component signal represents pixel intensity and said second 
component signal represents the location of a pixel in a frame, 



43 . The apparatus in accordance with claim 42 
further coir^jrising means for generating a set of reference 
coitponent signal values which represents a reference pattern of 
samples, means for producing for each record a plurality of 
component change signal values representing the input signal, 
each component change signal value being equal to the differ- 
ence between the reference pattern of samples and the record. 
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44 , The apparatus of claim 43 wherein a component 
change signal may result in several pixels of the reference 
image being mapped to a common pixel of one of the frames, the 
intensity of the common pixel being equal to a weighted suca of 
the intensities of the several pixels. 



45 • The apparatus of claim 43 wherein a component 
change signal may result in several pixels of the reference 
image being mapped to a common pixel of one of the frames, 
further comprising means for controlling the intensity of the 
common pixel to equal the difference between a constant and the 
sum of the intensities of the several pixels. 

46 • The apparatus of claim 43 wherein a component 
change signal may result in several pixels of the reference 
image being mapped to a common pixel of one of the frames, 
further comprising means for defining a depth for each of the 
several pixels, and means for controlling the intensity of the 
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common pixel to be equal to the intensity of the pixel among 
the several pixels which has the least depth* 

47. The apparatus of claim 43 wherein the reference 
image includes a collection of holons, the collection of holons 
containing every different holon appearing among all the frames 
of the input signal. 

48. The apparatus of claim 47 wherein the holons 
include a set of pixels exhibiting coordinated behavior in at 
least one domain, said means for encoding producing at least 
one of a load signal and score signal of at least one component 
signal which operates only on said set of pixels. 

49. An apparatus for decoding an encoded signal 
composed of a plurality of component signals in different 
domains to an input signal comprised of data samples organized 
into records of multiple samples, with each sample occupying a 
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unique position within its record, said encoded signal repre- 
sented as a combination of a plurality of factors, each factor 
being the product of a score signal and a load signal, the 
score signal defining the variation of data samples from record 
to record and the load signal defining the relative variation 
of a sxibgroup of samples in different positions of a record, 
said apparatus utilizing a reference pattern of samples, 
comprising: 

a. means for multiplying each load signal by 
its associated score signal to produce each factor; 

b. means for combining the factors produced in 

step a; 

c. means for modifying the set of reference 
component signal values according to the combined factors 
produced in step b to produce the records of a reproduced input 
signal. 
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50. An apparatus as in claim 49 further comprising a 
storage medium containing at least one of the load signals and 
score signals. 



51* An apparatus as in claim 49, wherein the storage 
medixam also contains the reference cort^onent signal values. 

52 . An apparatus as in claim 49 further comprising 
means for receiving at least one of the load signals and score 
signals from a remote location over a communications medium. 

53. The apparatus of claim 52 wherein the reference 
component signal values are also received over the communica- 
tions medium. 



54. An apparatus for editing an encoded signal 
composed of a plurality of component signals in different 
domains to an input signal comprised of data samples organized 
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into records of multiple samples, with each sample occupying a 
unique position within its record, said encoded signal repre- 
sented as a combination of a plurality of factors, each factor 
being the product of a score signal and a load signal, the 
score signal defining the variation of data samples from record 
to record and the load signal defining the relative variation 
of a subgroup of samples in different positions of a record, 
said apparatus utilizing a reference pattern of samples, 
comprising: 

a. means for modifying at least one score 
signal to achieve desired editing; 

b. means for multiplying each load signal by 
its associated modified score signal to produce each factor; 

c* means for combining the factors produced in 

step a; 

d. means for modifying the set of reference 
component signal values according to the combined factors 
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produced in step b to produce the records of a reproduced input 



signal • 



55. A system comprising a reading apparatus and a 
data carrier containing data and adapted to be decoded accord- 
ing to the method of any one of claims 28-32. 

56. A system comprising a recording apparatus and a 
data carrier containing an encoded signal produced by the 
method of any one of claims 1-28. 

57. A system comprising a reading apparatus and a 
data carrier coitprising data and adapted to be decoded by the 
apparatus of any one of claims 49-53. 

58. A system comprising a recording apparatus and a 
data carrier containing an encoded signal produced by the 
apparatus of any one of claims 34-48. 

59. A system comprising a recording apparatus, a 
data carrier and a reading apparatus, wherein the data carrier 
contains an encoded signal produced according to the method of 
any one of claims 1-28 and adapted to be decoded by the method 
of any one of claims 28-32. 
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60- A system comprising a recording apparatus, a 
data carrier and a reading apparatus, wherein the data carrier 
contains an encoded signal produced by the apparatus of any one 
of claims 34-48 and adapted to be read by the apparatus of any 
one of claims 49-53. 

61. A data carrier containing data recorded thereon 
and adapted to be decoded by the method of any one of claims 
28-32. 

62. A data carrier containing an encoded signal 
produced by the method of any one of claims 1-28. 

63. An apparatus producing a transmitted signal 
containing an encoded signal produced by the method of any one 
of claims 1-28: 



64. The encoded signal produced by the method of any 
one of claims 1-28 provided on one of a storage medium and a 
transmission mediiam. 
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